Home People Research Publications Demos
News Jobs Prospective
About Internal

Discriminant Saliency Network with a trainable neuron model

Biological vision has long been a source of inspiration for object recognition algorithms. The introduction of the backpropagation algorithm established a framework for the design of neural networks and was highly successful for a number of recognition problems. However, sigmoidal neural networks are a very coarse model of the neurophysiology of the visual system and underperform state of the art object recognizers from computer vision.

In this work, we introduce a recognition architecture that more realistically models the neurophysiology of vision. It is based on a computational model of simple and complex cells in the visual cortex (areas V1, V2, V4), and allows the tuning of each cell for optimal discrimination in the context of a given recognition problem. We investigate the biological plausibility of statistical inference and learning, tuned to the statistics of natural images. It is shown that a rich family of statistical decision rules, confidence measures, and risk estimates, can be implemented with this network. In particular, different statistical quantities can be computed through simple re-arrangement of lateral divisive connections, non-linearities, and pooling. It is then shown that a number of proposals for the measurement of visual saliency can be implemented in a biologically plausible manner, through such rearrangements. This enables the implementation of biologically plausible feed-forward object recognition networks that include explicit saliency models. The potential of combined attention and recognition is illustrated by replacing the first layer of the HMAX architecture with a saliency network. Various saliency measures are compared, to investigate whether 1) saliency can substantially benefit visual recognition, and 2) the benefits depend on the specific saliency mechanisms implemented. Experimental evaluation shows that saliency does indeed enhance recognition, but the gains are not independent of the saliency mechanisms. Best results are obtained with top-down mechanisms that equate saliency to classification confidence.

The basic concept is illustrated in the following figure, for an object recognition problem where the target is the class of airplanes. Given a set of example images from this class, and a set of examples from the null hypothesis (in this case any object other than a plane), the visual system relies on a set of bandpass (e.g. Gabor) filters to extract visual features characteristic of the two classes. The generalized Gaussian distributions (GGDs) that best fit the distributions of filter responses under the two hypotheses are then estimated. Given a new image, the corresponding features are extracted, and a log-likelihood ratio (LLR) is computed, using these GGDs. Thresholding this quantity then produces a binary map that indicates the locations of the target within the visual field.

One of the interesting results of this work is that, for GGD stimuli, many of the fundamental computations of probabilistic inference and learning can be implemented with the standard neurophysiologic model of visual cortex. The following table summarizes a number of statistical risks that can be computed with this biological architecture.

The figure below illustrates the benefits of saliency and cell tuning for object recognition. It reports to the toy task of underline bar detection, where the goal is to distinguish underlined from non-underlined characters. It compares the performance of HMAX, which has no saliency, a rare feature detector (RFD) a network with bottom-up saliency, and the log-likelihood ratio (LLR) network, whose units are tuned for underlined character detection, using top-down saliency. The underline bar is salient in the top-down sense, since it is the only part that distinguishes the target and non-target examples. LLR units produce a strong response to underline bars (plausible under target, non-plausible under the non-target hypothesis) and a weak response to everything else (equally plausible, or non-plausible, under the two hypotheses). The network has thus learned that horizontal bars are discriminant features for the detection of underlined characters, and thus salient. None of the other networks (HMAX, RFD) has this property.

The figure below presents the results of a comparison on the Caltech101 dataset. The performance of networks that compute different risks is compared. The proposed network corresponds to the left-most bar, while HMAX, the popular saliency model by Itti and Koch, and a sigmoid neural network are shown towards the right.

More details are available in the paper below.

Selected Publications:
  • Biologically Plausible Saliency Mechanisms Improve Feedforward Object Recognition
    Sunhyoung Han, and Nuno Vasconcelos
    Vision Research, vol. 50(22), 2295-2307, October 2010
    [pdf] [doi:10.1016/j.visres.2010.05.034]

Contact: Sunhyoung Han Nuno Vasconcelos,