|
Traditional motion representations, based on optical flow,
are inherently local and have significant difficulties when
faced with aperture problems and noise. The classical solution
to this problem is to regularize the optical flow
field,
but this introduces undesirable smoothing across motion edges
or regions where the motion is, by definition, not smooth
(e.g. vegetation in outdoors scenes). It also does not provide
any information about the objects that compose the scene, although
the optical flow field could be subsequently used for motion
segmentation. More recently, there have
been various attempts to model videos as a superposition of
layers subject to homogeneous motion. While layered representations
exhibited significant promise in terms of combining the advantages
of regularization (use of global cues to determine local motion)
with the flexibility of local representations (little undue smoothing),
and a truly object-based representation, this potential has so far not
fully materialized. One of the main limitations is their dependence on
parametric motion models, such as affine transforms, which assume a
piece-wise planar world that rarely holds in
practice. In fact, layers are
usually formulated as "cardboard" models of the world that are
warped by such transformations and then stitched to form the frames in
a video stream. This severely limits the types of videos
that can be synthesized: while the concept of layering showed most promise for
the representation of scenes composed of ensembles of objects subject to
homogeneous motion (e.g. leaves blowing in the wind, a flock of birds, a
picket fence, or highway traffic), very little progress has so far been
demonstrated in actually modeling such scenes.
Recently, there has been more success in modeling complex scenes as
dynamic textures or, more precisely, samples from stochastic
processes defined over space and
time. This work
has demonstrated that global stochastic modeling of both video
dynamics and appearance is much more powerful than the classic global
modeling as "cardboard" figures under parametric motion. In fact, the
dynamic texture (DT) has shown
a surprising ability to abstract a wide variety of complex patterns
of motion and appearance into a simple spatio-temporal model.
One major current limitation is, however, its inability to decompose visual
processes consisting of multiple, co-occurring, dynamic textures,
for example, a flock of birds flying in front of a water fountain or
highway traffic moving at different speeds, into separate regions
of distinct but homogeneous dynamics. In such cases, the global nature
of the existing DT model makes it inherently ill-equipped to segment the
video into its constituent regions.
In this work, we address this limitation by introducing a new generative
model for videos, which we denote by the
layered dynamic texture (LDT).
This consists of augmenting the dynamic texture with a discrete
hidden variable, that enables the assignment of different dynamics
to different regions of the video. The hidden variable is modeled as a
Markov random field (MRF) to ensure spatial smoothness of the regions, and
conditioned on the state of this hidden
variable, each region of the video is a standard DT.
By introducing a shared dynamic representation for all pixels
in a region, the new model is a layered representation. When compared
with traditional layered models, it replaces layer formation by
"warping cardboard figures" with sampling from the generative model
(for both dynamics and appearance)
provided by the DT. This enables a much richer video
representation. Since each layer is a DT, the
model can also be seen as a multi-state dynamic texture, which is capable
of assigning different dynamics and appearance to different image regions.
We apply the LDT to motion
segmentation of challenging video sequences.
|
| Selected Publications: |
- Layered dynamic textures
A. B. Chan and N. Vasconcelos,
IEEE Transactions on Pattern Analysis and Machine Intelligence: Special Issue on Probabilistic Graphical Models in Computer Vision (TPAMI), to appear 2009.
© IEEE,
[ps][pdf].
- Variational Layered Dynamic Textures
A. B. Chan and N. Vasconcelos,
In IEEE Conf. on Computer Vision and Pattern Recognition (CVPR),
Miami, June 2009.
© IEEE,
[pdf].
- Derivations for the Layered Dynamic Texture and Temporally-Switching Layered Dynamic Texture
A. B. Chan and N. Vasconcelos,
Technical Report SVCL-TR-2009-01,
June 2009.
[pdf].
- Layered Dynamic Textures
A. B. Chan and N. Vasconcelos,
In Neural Information Processing Systems 18 (NIPS), pp. 203-210,
Vancouver, December 2005.
[ps][pdf]
|