|Semantic Image Classification|
The traditional model for image retrieval is that of query-by-example. This is a model where the user provides a query image and retrieval consists of finding the closest database matches to the query. Query-by-example is sometimes ineffective, since 1) it is not always easy to find a good query for a given target image to retrieve (e.g. for example images representative of concepts such as "outdoors" or objects such as "chair"), and 2) it tends to narrow the definition of similarity to visual similarity, i.e. two images are similar if and only if they share the same patterns of color, texture, and object shape. In result of these two limitations, a "semantic gap" is usually associated with query-by-example, e.g. a user provides a query image of a beach scene but the matches returned by the retrieval system include images from various other classes that also contain large areas of sky.
The performance of a retrieval system can be improved by augmenting it with semantic classifiers. These are classifiers tuned to certain semantic concepts that have previously been identified as important. The user can then simply include a textual component in the query, where those semantic concepts are explicitly requested (e.g. that the image should be from an outdoors scene). While some semantics are arguably useful for all applications (e.g. the ability to find people, faces, etc.) most semantics are user and application specific. The problem faced by semantic retrieval systems is how to support a large range of semantic concepts. While it is not difficult to include a face detector or recognizer in a retrieval system, it is significantly more complex to provide support for the hundreds of thousands of concepts that may, at some point, be requested by any individual user.
This project addresses the problem by posing it as one of "building classifiers from classifiers". The basic representation is the minimum-probability-of-error retrieval model, where each image is represented as a probability density in a suitable feature space. Concepts are represented hierarchically and densities of higher level concepts are learned from those of low-level ones. An efficient hierarchical mixture learning algorithm is used, which requires processing image data only for the estimation of the individual image densities (individual images are the leaves of the tree). Above that, all densities are estimated directly from the parameters of their children, a process that is extremely efficient. Since individual densities are required for query-by-example, the addition of the semantic hierarchy does not increase in any significant way the overall learning complexity. In this way, learning is scalable in the number of semantic concepts, and it becomes possible to support large vocabularies.