Anomaly Detection and Localization in Crowded Scenes

Home People Research Publications Demos

News Jobs Prospective
Students About Internal

Anomaly Detection and Localization in Crowded Scenes

This project is part of our efforts in solving problems in densely crowded environments analysis. Following our previous topics in classifying crowd states, segmenting videos into components, estimating crowd size and tracking objects in crowds, the goal here is to detect the deviations from normal crowd behaviors, which is motivated by the ubiquity of camera surveillance systems, the challenges in modeling crowd behaviors, and the importance of automatic crowd monitoring for various applications.

Anomaly detection is an active area of research on its own. Various approaches have been proposed, for both crowded and non-crowded scenes. Existing approaches focus uniquely on motion information, ignoring abnormality information due to variations of object appearance. This makes them impervious to abnormalities that do not involve motion outliers, e.g., a truck that crosses a bridge with weight restrictions. Furthermore, descriptors such as optical flow, pixel change histograms, or other traditional background subtraction operations, are difficult for crowded scenes, where the background is by definition dynamic, of widespread clutter, and complicated occlusions.

anomaly detection and localization can be broken down into two sub-problems: 1) how to characterize crowd behaviors, and 2) how to measure the "anomaly score" of a specific behavior. For the first issue, we propose to model motion patterns in crowds via the use of mixture of dynamic textures (MDT), which is a unified description capturing both the appearance and dynamics of visual processes. In the second part, instead of directly modeling the anomalous behavior itself, the normalcy is first learnt, and then the "anomaly score" of an observation is computed by measuring the difference from the normalcy model. Specifically, two components are proposed to reflect the normalcy in different perspectives.

In the temporal component, the visual field is divided into overlapping regions, and an MDT is learned for each region of all videos from normal scenes. For every observation location in the test frame, the normalcy likelihood of a spatio-temporal patch centered at that location is computed under the mixture model learned for its nearest region, and the temporal anomaly map is formed by the negative log normalcy likelihood of each location.

In the spatial component, a global mixture model is learned using only the patches around the current frame. Using a discriminant center-surround saliency approach for mixture of dynamic textures, a saliency map (spatial anomaly map) is computed at each location. As the saliency is computed with respect to mixtures of dynamic textures, this map tries to find regions that are least similar to their surrounds in terms of both appearance or motion, which are the implicitly assumed normalcy reference, and hence most likely to be abnormal.

In this way, a hierarchy of multi-scale temporal and spatial anomaly maps are computed by varying the size of overlapping regions in temporal component (support regions for normalcy models), and that of surrounds in spatial component. Finally, all anomaly maps are discriminatively integrated by CRFs to produce the final anomaly prediction.

Dataset:

UCSD Anomaly Detection Dataset
To evaluate the performance of the proposed approach, we also introduce a dataset consisting of videos of a crowded pedestrian walkway with manually collected frame-level and pixel-level ground truth. [dataset]

Results:

Anomaly Detection Results
Please refer to the publication for qualitative evaluation. Here presents visual results comparing the performance of the proposed approach with other state-of-the-art anomaly detection schemes (at the time of testing). [results]

Publications: Anomaly Detection in Crowded Scenes
Vijay Mahadevan, Weixin Li, Viral Bhalodia and Nuno Vasconcelos.
In Proc. IEEE Conference on Computer Vision and Pattern Recognition (CVPR),
San Francisco, CA, 2010. IEEE© [ ps | pdf | BibTeX ]

Anomaly Detection and Localization in Crowded Scenes
Weixin Li, Vijay Mahadevan and Nuno Vasconcelos
IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI)
Vol. 36, No. 1, pp18-32, January, 2014
[ pdf | appendix ( ps | pdf ) | BibTeX ]

Contact: Weixin Li, Vijay Mahadevan, Nuno Vasconcelos

Home	People	Research	Publications	Demos

News	Jobs	Prospective Students	About	Internal

Anomaly Detection and Localization in Crowded Scenes

This project is part of our efforts in solving problems in densely crowded environments analysis. Following our previous topics in classifying crowd states, segmenting videos into components, estimating crowd size and tracking objects in crowds, the goal here is to detect the deviations from normal crowd behaviors, which is motivated by the ubiquity of camera surveillance systems, the challenges in modeling crowd behaviors, and the importance of automatic crowd monitoring for various applications. Anomaly detection is an active area of research on its own. Various approaches have been proposed, for both crowded and non-crowded scenes. Existing approaches focus uniquely on motion information, ignoring abnormality information due to variations of object appearance. This makes them impervious to abnormalities that do not involve motion outliers, e.g., a truck that crosses a bridge with weight restrictions. Furthermore, descriptors such as optical flow, pixel change histograms, or other traditional background subtraction operations, are difficult for crowded scenes, where the background is by definition dynamic, of widespread clutter, and complicated occlusions. anomaly detection and localization can be broken down into two sub-problems: 1) how to characterize crowd behaviors, and 2) how to measure the "anomaly score" of a specific behavior. For the first issue, we propose to model motion patterns in crowds via the use of mixture of dynamic textures (MDT), which is a unified description capturing both the appearance and dynamics of visual processes. In the second part, instead of directly modeling the anomalous behavior itself, the normalcy is first learnt, and then the "anomaly score" of an observation is computed by measuring the difference from the normalcy model. Specifically, two components are proposed to reflect the normalcy in different perspectives.
In the temporal component, the visual field is divided into overlapping regions, and an MDT is learned for each region of all videos from normal scenes. For every observation location in the test frame, the normalcy likelihood of a spatio-temporal patch centered at that location is computed under the mixture model learned for its nearest region, and the temporal anomaly map is formed by the negative log normalcy likelihood of each location.
In the spatial component, a global mixture model is learned using only the patches around the current frame. Using a discriminant center-surround saliency approach for mixture of dynamic textures, a saliency map (spatial anomaly map) is computed at each location. As the saliency is computed with respect to mixtures of dynamic textures, this map tries to find regions that are least similar to their surrounds in terms of both appearance or motion, which are the implicitly assumed normalcy reference, and hence most likely to be abnormal.
In this way, a hierarchy of multi-scale temporal and spatial anomaly maps are computed by varying the size of overlapping regions in temporal component (support regions for normalcy models), and that of surrounds in spatial component. Finally, all anomaly maps are discriminatively integrated by CRFs to produce the final anomaly prediction.