This page presents a collection of results
obtained with region-of-interest (ROI) coding, using ROI masks produced by object based ROI detection.
The basic idea is to enable image coders to become experts on the visual
concepts that are most important to their users. This allows
the allocation of more bits to the image areas where these concepts appear, and
can be useful for a number of applications. The process
of training the coder consists of simply collecting (e.g. on
search) a number of examples (about 40 in the examples below)
of the concept of interest.
Examples of ROI-based compression of cluttered scenes
Consider the following image, composed of a number of different objects. A semantic coder
was trained for each of the five major visual classes in the image:
faces, cars, the Capitol, trees, and street lamps.
Once the coder has been trained to recognize these classes, the user can specify
a class of interest. Image areas containing that class are
deemed salient by the coder and allocated more bits. This could be interesting for photo- or
video-sharing over wireless networks. The user can quickly
transmit a number of pictures that emphasize the components of the scene of
interest (e.g. the user and a monument in the background).
The receiver of the pictures can then quickly select one among those made
available and request a high-quality replica of that photo alone.
following are examples of ROI-coded images, using a JPEG2000 encoder and a
saliency detector developed in this project. These examples
present a magnification of the image above, after it has been coded with regular
JPEG2000 (left) and ROI-based JPEG2000 (right). The same number
of bits was assigned to the entire image, and training was based on 40 examples collected from
the web for each object class. The user specified
the "faces" (row 1 and 2), "cars" (row 3), "Capitol" (row 4), "trees"
(row 5), and "street lamps" as the classes of interest. Note the highest fidelity
of the regions
containing these visual concepts.
Examples of ROI-based compression of images containing street signs
We next show examples of regular JPEG coding (left) vs ROI coding (right) based
on the "street sign" class. In this trial the street-sign
coder was learned from
40 examples collected from the web. This coder could be useful for a
guidance system: the user simply points a cell
phone at the street
sign, an image is compressed and transmitted to a base station, where an OCR
system is used to determine the user
location and transmit
back a number of suggestions (places to visit, nearby stores, etc.). The
allocation of a large percentage of the bits to
covered by the street sign enables accurate performance of the OCR system, at
an overall reduced bit-rate. In the examples below
this overall rate was made low enough to make the non-ROI coded street signs
(or some of their details) imperceptible to a human viewer.
Note how the details of the ROI-coded
signs are perceptible at the same overall bit rate.
Examples on the
Caltech object database
The next set of examples was produced by a ROI coder trained on human faces. The
obvious application would be video-telephony over narrow-band networks.
Note how the coder assigns higher quality to the face area, at the cost of loss
of (irrelevant) detail in the background.