Generalized Separation of Style and Content on Nonlinear Manifolds with Application to Human Motion Analysis


NSF CAREER Award number IIS- 0546372

PI: Ahmed Elgammal, Rutgers University


Graduate Student Investigator: Chan-Su Lee

Duration: January, 2006 _ Dec -2012


Project Summary


The problem of separation of style and content is an essential element of visual perception and is a fundamental mystery of perception. For example, we are able to recognize faces and actions under wide variability in the visual stimuli. While the role of manifold representations is still unclear in perception, it is clear that images of the same object lie on a low-dimensional manifold in the visual space defined by the retinal array. On the other hand, neurophysiologists have found that neural population firing is typically a function of a small number of variables, which implies that population activities also lie on low-dimensional manifolds.


This project is focused on modeling the visual manifolds of biological motion. Despite the high dimensionality of the configuration space, many human motions intrinsically lie on low-dimensional manifolds. This is true for the kinematics of the body, as well as for the observed motion through image sequences.  Let us consider the observed motion. For example, the silhouette (occluding contour) of a human walking or performing a gesture is an example of a dynamic shape, where the shape deforms over time based on the action being performed. These deformations are restricted by the physical body and the temporal constraints posed by the action being performed. Given the spatial and the temporal constraints, these silhouettes, as points in a high-dimensional visual input space, are expected to lie on a low-dimensional manifold. Intuitively, the gait is a one-dimensional manifold that is embedded in a high-dimensional visual space. Such a manifold can be twisted and even self-intersect in the high-dimensional visual space.  Similarly, the appearance of a face performing expressions is an example of a dynamic appearance that lies on a low-dimensional manifold in the visual input space. In fact, if we consider certain classes of motion, such as gait, or a single gesture, or a single facial expressions and if we factor out all other sources of variability, each of these motions lie on a one-dimensional manifold, i.e., a trajectory in the visual input space. Such manifolds are nonlinear and non-Euclidean.


Although the intrinsic body configuration manifold might be very low in dimensionality, the resulting visual manifold (in terms of shape and/or appearance) is challenging to model, given the various aspects that affect the appearance. Examples of such aspects include the body type (slim, big, tall etc.) of the person performing the motion, clothing, viewpoint, and illumination. Such variability makes the task of learning a visual manifold very challenging, because we are dealing with data points that lie on multiple manifolds at the same time: body configuration manifold, viewpoint manifold, body shape manifold, illumination manifold, etc.






The main contribution of this project is a computational framework for learning a decomposable generative model that explicitly factorizes the intrinsic body configuration (content) as a function of time from the appearance (style) factors. The framework is based on decomposing the style parameters in the space of nonlinear functions that maps between a unified representation of the content manifold and style-dependent observations.  Given a set of topologically equivalent manifolds, the Homeomorphic Manifold Analysis (HMA) framework models the variation in their geometries in the space of functions that maps between a topologically-equivalent common representation and each of them. The common representation of the content manifold can be learned from the data or can be enforced in a supervised way if the manifold topology is known. The main assumption here is that the visual manifold is homeomorphic to the unified content manifold representation, and that the mapping between that unified representation and the visual space can be parameterized by different style factors. [read more…] [Sample Publications]




Example generative model for gait: Multiple views and multiple people:



Example: Generative Model for Facial Expression: Different expressions, different people:





Modeling Multiple Continuous Manifolds: View, Posture, body style


We consider modeling data lying on multiple continuous manifolds. In particular, we model shape manifold of a person performing a motion observed from different view points along a view circle at fixed camera height. We introduce a model that ties together the body configuration (kinematics) manifold and the visual manifold (observations) in a way that facilitates tracking the 3D configuration with continues relative view variability. The model exploits the low dimensionality nature of both the body configuration manifold and the view manifold where each of them is represented separately.

- C-S. Lee and A. Elgammal “Coupled Visual and Kinematics Manifold Models for Human Motion Analysis” International Journal on Computer Vision. Volume 87, Numbers 1-2, March 2010.


C.-S. Lee and A. Elgammal Modeling View and Posture Manifold for Tracking , International Conference on Computer Vision (ICCV), 2007.




Example tracking of a ballet motion from multiple views:



(a) Sampled shapes in different views for ballet.____ _____(b) Embedded kinematics manifold in 2D


(c) View manifold embedded in the kinematic manifold mapping space (d) Velocity field value with interpolation.

(e) Test silhouette sequences

(f) Reconstruction of 3D body pose based on estimated body configuration


Demo Videos – a Generative Model for a ballet dance routine:


Fixing the body posture and changing the viewpoint

:::Demos:new videos:ballet_snapshot.jpg

Fixing the viewpoint and changing the posture

:::Demos:new videos:ballet_snapshot.jpg


Changing both the posture and viewpoint


:::Demos:new videos:ballet_snapshot.jpg


 More demo views are available here [click]



Joint Modeling of Posture and View on a Torus: Tracking People on a Torus





A framework for monocular 3D kinematic posture tracking and viewpoint estimation of periodic and quasi-periodic human motions from an uncalibrated camera. Both the visual observation manifold and the kinematic manifold of the motion are learnt using a joint representation. We showed that the visual manifold of the observed shape of a human performing a periodic motion, observed from different viewpoints, is topologically equivalent to a torus manifold. The approach is based on the supervised learning of both the visual and kinematic manifolds. Instead of learning an embedding of the manifold, we learn the geometric deformation between an ideal manifold (conceptual equivalent topological structure) and a twisted version of the manifold (the data). Experimental results show accurate estimation of the 3D body posture and the viewpoint from a single uncalibrated camera.


- C-S. Lee and A. Elgammal “Tracking People on a Torus” IEEE transactions on Pattern Analysis and Machine Intelligence, March 2009

- C.-S. Lee and A. Elgammal  “Simultaneous Inferring View and Body Pose Using Torus Manifolds”



Demo Videos:











Selected Publications:


C-S. Lee and A. Elgammal “Coupled Visual and Kinematics Manifold Models for Human Motion Analysis” International Journal on Computer Vision. Volume 87, Numbers 1-2, March 2010.

C-S. Lee and A. Elgammal “Tracking People on a Torus” IEEE transactions on Pattern Analysis and Machine Intelligence, March 2009 

C.-S. Lee and A. Elgammal Modeling View and Posture Manifold for Tracking , International Conference on Computer Vision (ICCV), 2007.

A. Elgammal and C.-S. Lee  “Nonlinear Manifold Learning for Dynamic Shape and Dynamic Appearance”  Computer Vision and Image Understanding (CVIU) special issue on generative model based vision.  April 2007

C.-S. Lee and A. Elgammal  “Simultaneous Inferring View and Body Pose Using Torus Manifolds” ICPR'06

A. Elgammal, C.-S. Lee “Separating Style and Content on a Nonlinear Manifold”  CVPR'04

A. Elgammal, C.-S. Lee “Inferring 3D Body Pose from Silhouettes using Activity Manifold Learning”  CVPR'04