“Abstract – We propose to model the traffic flow in a video using a holistic generative model that does not require segmentation or tracking. In particular, we adopt the dynamic texture model, an auto-regressive stochastic process, which encodes the appearance and the underlying motion separately into two probability distributions. With this representation, retrieval of similar video sequences and classification of traffic congestion can be performed using the Kullback-Leibler divergence and the Martin distance. Experimental results show good retrieval and classification performance, with robustness to environmental conditions such as variable lighting and shadows.”i

As mention above, we are investigating the utility of a linear dynamic system as the input space for a few motion classification techniques. In this approach, the spatial appearance and underlying motion of video is modeled separately. However, these models are intrinsically constructed from the three dimensions of video (interframe data).

The first two dimensions are what is shown in any one frame. The third dimension is time. Older methods of motion classification involved dealing with individual frames independently (i.e. segmentation) and tracking recognized shapes through an array of frames. Our classification approach intrinsically includes all three dimensions in feature extraction. Therefore, classification may be performed on video with interframe language. A necessity in motion classification of video containing only fluidity.

After our linear dynamic system is constructed, “classification is performed by selecting the category corresponding to the Gaussian-mixture hidden Markov model (GM-HMM) of largest likelihood for the query video.”ii This has been experimentally shown to be an accurate process for congestion levels on a feeway. However, we’re interested in discovering the latitude in classifiers this method promises.

Our objective is to implement our method of classification on many interesting scenes. On page 2 is a list of scenes filmed. We have compiled a video database with over 24 hours of video of over 10 different scenes. Each scene’s subject is fundamentally either pedestrian or vehicular traffic flow.

Our goal for each scene was to film from three different angles, each angle having both a panning and still shot. Scenes were filmed on multiple occasions at different times of day in order to capture different levels of flow. The goal of each scene was to build simple motion classifiers that would distinguish between congestion levels and types of traffic flow (i.e. pedestrians exiting a building or cars turning left at an intersection).

We will demonstrate the robust classification of pedestrian as well as vehicular traffic flow scenes (i.e. high, medium, or low congestion, entering, exiting, walking away from us or towards us).

Currently the video database we have compiled is on the web at:

<www.svcl.ucsd.edu/projects/motiondb>

Scenes Shot

Pedestrian Throughway

01. Sidewalk 01 (near CG BLFG)

02. Sidewalk 02 (near Economics BLDG)

03. UCSD Library Walk

04. Sidewalk 03 (Downtown Fifth Ave.)

05. Parade (Hillcrest)

06. Plaza (UCSD Center Hall)

Pedestrian Entrance/Exit

07. Geisel Library

08. Sea World Exit

09. Atkinson Hall (Emergency Evacuation)

10. Concert (Not shot yet)

Vehicular Throughway

11. Intersection 01

12. Intersection 02

Vehicular Entrance/Exit

13. Freeway Entrance

14. Parking Structure

Process Overview

Videography:

Video shot using Sony DCR-HC38 HandyCam.

All angles were shot with camera zoom fully wide and tripod legs fully extended unless otherwise noted. However, most panning shots were shot with center arm not extended for better stability in manually rotating camera.

Care was taken to shoot each angle completely before moving tripod to another location. However, not all traffic scenes are capable of being shot on the same day. Therefore, some slight variation in an angle’s view was incurred. This error was minimized either by (1) cropping the video so the we had the same view in frame across the board, or (2) re-shooting the entire scene over for more consistency.

After shooting, video is cropped temporally to cut our outliers and obtain a pure class (i.e. just low traffic or just cars turning right). Then video dimensions were reduced to 50% and compressed using codecs: MPEG-4 (.mp4), H.264 (.mov), cinepak (.avi). Care was taken to not compress excessively as noticeable artifacts would occur.

Directories:

1) < /data2/vidtex/mulloy > - Contains three main folders.

a) <.../video > - Contains raw video which as filmed.

b) <.../Data > - Contains video clips that went through ‘vid2y48.m’

c) <.../MetaData > - Contains:

i) run_experiment.m - Executes script to model processed video clips, to train classifiers, to test classifiers, to validate classifiers, to see error results. Experiment parameters are set for particular scene.

ii) ImageMaster.mat - Contains list of pre-processed video directories and their respective class (e.g. high, medium, low)

iii) exp**.mat - Contains workspace from executing ‘run_experiment.m’ in Matlab. Contains experimental results.

2) ~/vidtex - Contains matlab m-files used by programs such as ‘vid2y48.m’ and ‘run_experiment.m’

Programs:

1) Matlab

2) vid2y48.m -- m-file within matlab. This program must be ran on server “weiner” as this server has the codecs for videos.

3) run_experiment.m --

Process:

1. Run vid2y48.m in Matlab. - This program converts full length color clips into arbitrarily long B&W data clips (usually 50 frames). This program also outputs ‘ImageMaster’ which is a metadata file containing the directory and class associated with every output clip.

2. Run run_experiment.m - located in each MetaData file with scene specific parameters.

3. Based on results, adjust experiment parameters.

4. Repeat steps 2 and 3.

01. Sidewalk01

(Pedestrian Throughway)

Shot Description: Shot near Cognitive Science Building and singing tree. Three angles: (1) Oblique - 6-foot tripod fully extended above heads and placed on boulder to give slight birds-eye-view, facing east towards library walk. (2) Front - eye level, two tripod legs on crack in cement facing west, (3) Side - eye-level in dirt facing north.

Filming Status:

Still Shots -

(1) Oblique Angle - Missing ‘medium’ traffic level. However, can make this up using a ‘medium-high’ class.

(2) Front Angle - Complete. Two slightly different versions.

(3) Side Angle - Missing: ‘high’ level. Can’t shoot this level until october.

Panning Shots -

No Panning shots were filmed due to time constraints.

Traffic-Level Classifier Description: Discriminating between high, medium, and low flows of pedestrian traffic. Three experiments, one per angle.

(1) Oblique - Initial Experiment Results:

Although a distinct ‘medium’ class was not shot yet, we were able to achieve high accuracy using two levels of high. (i.e. ‘low-high’ & ‘high-high’).

md_error_mean = 0.1659 = 83.41% Accurate

kl_error_mean = 0.2505 = 74.95%

kl_svmerror_mean = 0.3141 = 68.59%

ikl_error_mean = 0.3676 = 63.24%

ikl_svmerror_mean = 0.3004 = 69.96%

(2) Front - Initial Experiment Results:

Waiting on results. ETA: less than one week.

(3) Side - Initial Experiment Results:

Waiting on ‘high’ class. ETA: October.

02. Sidewalk02

(Pedestrian Throughway)

Shot Description: Shot near Economics Building on Ridgewalk, mid-day. Three angles: (1) Front - eye level, set up under bridge, facing south, (2) Side - eye-level facing west. (3) Oblique - Shot on bridge, in middle, facing south. ‘Medium’ and ‘low’’ are readily available. However, ‘Heavy’ was captured during passing periods.

Filming Status:

Still Shots -

(1) Oblique Angle - Complete.

(2) Front Angle -Compete.

(3) Side Angle - Missing ‘high’ level. Can’t shoot this level until october.

Panning Shots -

Panning-Low-Front was shot successfully.

Traffic-Level Classifier Description: Discriminating between high, medium, and low flows of pedestrian traffic. Three experiments, one per angle.

(1) Front - Initial Experiment Results:

(2) Side - Initial Experiment Results:

(3) Oblique - Initial Experiment Results:

03. Library Walk

(Pedestrian Throughway)

Shot Description: Libary Walk is a throughway used by thousands throughout the day. At times there are so many people using this throughway that it is difficult to even walk.

Three angles: (1) Front - eye level, set up in center of walkway facing south, (2) Side - eye-level facing west. (3) Oblique - Shot from center of 5th floor of Geisel Library, UCSD, mid-day and facing south.

Filming Status:

Still Shots -

(1) Oblique Angle - Complete. Two versions of this angle. Shot at different times of year.

(2) Front Angle -Compete.

(3) Side Angle - Missing ‘high’ and ‘medium’ levels. Can’t shoot this level until october.

Panning Shots -

Missing Panning-Front.

Panning Oblique is not compatible.

Traffic-Level Classifier Description: Discriminating between high, medium, and low flows of pedestrian traffic. Three experiments, one per angle.

(1) Front - Initial Experiment Results:

md_error_mean = 0.3558 = 64% Accurate

kl_error_mean = 0.3657 = 64%

kl_svmerror_mean = 0.4138 = 58%

ikl_error_mean = 0.3800 = 62%

ikl_svmerror_mean = 0.2983 = 70%

(2) Side - Initial Experiment Results:

Waiting on ‘high’ and ‘medium’ classes.

(3) Oblique - Initial Experiment Results:

md_error_mean = 0.4314

kl_error_mean = 0.5329

kl_svmerror_mean = 0.4155

ikl_error_mean = 0.5611

ikl_svmerror_mean = 0.4546

(3) Oblique Expectiations: These were the errors before the data was separated into two versions. Better accuracy is expected from each version after this distinction is ran through experiment. However, these results show that there is a large error associated with calibration.

05. Parade

(Pedestrian Throughway)

Shot Description: Pride Parade Summer 2006. Shot the streets just after the parade ended. The streets were taken over by pedestrians. Most walking in the same direction.

Filmed Three Angles plus walking footage. Three Angles: (1) Front - eye level facing pedestrians as they walked towards the camera, (2) Side - eye level facing perpendicular to the motion of the crowd., (3) Oblique - off at an angle between the first two angles, above eye level. Walking: walked with crowd, camera at chest level.

06. Plaza

(Pedestrian Throughway)

Shot Description: Center Hall Plaza, at the end of library walk, south of the library, is a large area that many use between and during classes. An interesting space that contains pedestrians, bikers, people sitting, people talking but not walking, and campus tours. One angle: (1) Oblique - second story oblique angle of entire court yard. Expected angles: (2) Front - eye level on library walk facing south. (3) Side - side of library walk, facing south-east.

Filming Status:

Still Shots -

(1) Oblique Angle - Complete

(2) Front Angle - Not Shot

(3) Side Angle - Not Shot

Panning Shots -

Traffic-Level Classifier Description: Discriminating between high, medium, and low flows of pedestrian traffic. Three experiments, one per angle.

(1) Oblique - Initial Experiment Results:

md_error_mean = 0.3259 = 67%

kl_error_mean = 0.3111 = 68%

kl_svmerror_mean = 0.2733 = 72%

ikl_error_mean = 0.2734 = 72%

ikl_svmerror_mean = 0.2810 = 72%

07. Geisel Library

(Pedestrian Entrance /Exit)

Shot Description: Geisel Library, located near Price Center on campus at the University of California, San Diego. The Library has many floors all filled with books and computers. A steady flow of traffic can be seen constantly at the entrance, even during summer sessions.

Three Angles: (1) Front - eye level, set up was on the edge of a semi-circle of concreate surrounding a few trees. Back two legs of tripod were lined up with joint of two cement segments. This is left of the silent tree, looking north. (2) Side - eye-level facing east. Set up was on inside of library in the ‘Seuss Room,’ shooting out through windows. (3) Oblique - Shot from center of 5th floor of Geisel Library, UCSD, mid-day and facing south.Revealing a birds-eye-view of the walkway just outside the library main entrance.

Filming Status:

Still Shots -

(1) Front Angle - Shoot complete, needs processing.

(2) Side Angle - Not enough traffic variation in tight angle.

(3) Oblique Angle - Complete

Panning Shots -

(1) Front Angle - Not shot

(2) Side Angle - Complete

(3) Oblique - Complete

Traffic-Level Classifier Description: Discriminating between high, medium, and low flows of pedestrian traffic. Three experiments, one per angle.

(1) Front - Initial Experiment Results:

(2) Side - Initial Experiment Results:

(3) Oblique - Initial Experiment Results:

Entering vs Exiting Classifier Description: Discriminating between pedestrians Entering, Exiting, and a Balance of the two. Three experiments, one per angle.

1) Front - Initial Experiment Results:

(2) Side - Initial Experiment Results:

(3) Oblique - Initial Experiment Results:

10. Rock Concert (Not Shot Yet)

(Pedestrian Entrance/Exit)

Shot Description: Will shoot the Entrance/Exit of a rock concert. We expect a steady flow into the venue, and light, medium and heavy outpour. Three Angles: (1) Front - eye level and head on facing the people exiting, (2) Side - eye level and facing perpendicular to motion.(3) Oblique - Shoot from high above and wider angle of exit.

Filming Status: Venue Not found

11. Intersection01

(Vehicular Intersection)

Shot Description: Shot at the intersection of La Jolla Village Drive and Villa La Jolla Drive over the course of several days during mid-summer. The three angles observed were (1) Front - ground-level center facing west, (2) Side - ground-level facing south, and (3) Oblique - birds eye view facing east.

Filming Status:

Still Shots -

(1) Front Angle - Complete

(2) Side Angle - Complete

(3) Oblique Angle - Complete

Panning Shots -

(1) Front Angle - Complete

(2) Side Angle - Complete

(3) Oblique - Complete

Traffic-Level Classifier Description: Initially we attempted to create a congestion level classifier for Intersection01. This was unsuccessful due to several reasons. (1) there may be many many cars in the interection, however if they are not moving then the intersection congestion is classified as low. (2) no clear definition of ‘high’, ‘medium’, and ‘low’ - lack of language to describe intersection congestion.

Experimental results: ~60% accurate.

Intersection State Classifier Description: Discriminates between six states of the intersection using only the Oblique angle.Eight States: (1) East-West-Thru (EWT) - initially east bound and west bound cars are continuing straight through intersection, (2) East-West-Left-Turn (EWLT) - initially east and west bound vehicles are turning left through intersection, (3) North-South-Thru (NST) - initially north and south bound vehicles continue straight through intersection, (4) North-South-Left-Turn (NSLT) - initially north and south bound traffic are turning left through interseciton, (5) South-Both-Left-Thru (SBLT) - initially south bound vehicles are turning left and continuing through interseciton , (6) West-Both-Left-Thru (WBLT) - initially west-bound vehicles are turning left and continuing straight through intersection, (7) East-Both-Left-Thru (EBLT) - initially east-bound vehicles are turning left and continuing straight through intersection, and (8) North-Both-Left-Thru (NBLT) - initially north-bound vehicles are turning left and continuing straight through intersection.

Shown below are depictions of the first six states.

(1) Oblique - Initial Experiment Results:

md_error_mean = 0.1676 = 83% Accurate

kl_error_mean = 0.1579 = 84%

kl_svmerror_mean = 0.1809 = 82%

ikl_error_mean = 0.3234 = 68%

ikl_svmerror_mean = 0.2713 = 73%

12. Intersection02

(Vehicular Intersection)

Shot Description: Shot at the Intersection of La Jolla Village Square and Nobel Dr. Three Angles: (1) Front - west side of intersection facing east, eye level, set up on island which separates east and west moving vehicles. (2) Side - north side facing south, eye-level, vehicles exiting plaza on our right, vehicles entering on left side,. (3) Oblique - Shot from nearby hill on northwest corner facing southeast. Shot a little higher than eye level.

14. Parking Structure

(Vehicular Entrance/Exit)

Shot Description: Gilman Parking Structure is located at the south-east edge of UCSD. During the school year parking spaces usually fill up by 8 or 9 am. Therefore the largest incoming flux is early in the morning.. Mid day there is a steady in and out of cars through the south facing exit. Around 4-5pm there is peak exiting as this is when many get off work.

Preliminary shots were taken from a few angles. The most promising angles are those from (1) the east side of the south exit, (2) In the center of the exit, facing the parking structure so that we are facing cars exiting, (3) From the second or third story shooting down upon the cars.

Filming Status: Preliminary stage.

Expectations: Although this may be a good scene to create a database on vehicular entrances/exits, there is a new parking structure opening up south adjacent to RIMAC that will be a larger. This may be a more ideal location. Furthermore, Horton Plaza in downtown San Diego has a large parking structure that may be worth examining for potential footage.

Ongoing Research

From here we will continue to film and build classifiers on the video we currently have. We have noticed that a bug in our experimentation: we can achieve higher accuracy if we perform experiments entirely from scratch (i.e. recomposing the SVMs and Nearest Neighbors from scratch) after each change in the experimental parameters. We fixed this bug by coding a deletion of the previous classifier before building a new one in experimenting.

Although there is a desire to clean up the training data to build a more accurate classifier, those outliers that fall in between classes are retained. This gives us more accurate feedback on the ability to model real world data.

We will also attempt to mix angles. We desire for a classifier to discriminate motion in only one angle. Additionally it is desirable for the classifier to discriminate between angles themselves.

For Example: If congestion can be accurately classified from two angles independently (e.g. high, medium and low from oblique and front angles, giving a total of 3 classes for each of two classifiers.), then these should be able to be combined into one classifier (e.g. 6 classes in one classifier). This should not affect accuracy.