We examine the problem of segmenting tracked feature point trajectories of multiple moving objects in an image sequence. Using the affine camera model, this motion segmentation problem can be cast as the problem of segmenting samples drawn from a union of linear subspaces. In practice, due to limitations of the tracker, occlusions and the presence of nonrigid objects in the scene, obtained motion trajectories may contain grossly mistracked features, missing entries, or not correspond to any valid motion model. We develop a robust subspace separation scheme that can deal with all of these practical issues in a unified framework. Our methods draw strong connections between lossy compression, rank minimization, and sparse representation. We test our methods extensively and compare their performance to several extant methods with experiments on the Hopkins 155 database. Our results are on par with state-of-the-art results, and in many cases exceed them.
Suppose we are given trajectories of P tracked feature points of a rigid object from F 2-D image frames of a rigidly moving camera. The affine camera model stipulates that these tracked feature points are related to their 3-D coordinates by the matrix equation:
where is the affine motion matrix at frame f. The affine motion matrix is parameterized by the camera calibration matrix and the relative orientation of the rigid object w.r.t. the camera . From this formulation we see that
Thus the affine camera model postulates that trajectories of feature points from a single rigid motion will all lie in a linear subspace of of dimension at most four.
A dynamic scene can contain multiple moving objects, in which case the affine camera model for a single rigid motion cannot be directly applied. Now let us assume that the given P trajectories correspond to N moving objects. In this case, the set of all trajectories will lie in a union of N linear subspaces in , but we do not know which trajectory belongs to which subspace. Thus, the problem of assigning each trajectory to its corresponding motion is reduces to a problem of segmenting data drawn from multiple subspaces, and so we can apply our Lossy Compression-based clustering techniques.
Real motion trajectories acquired by a tracker can have many kinds of problems that can severely afffect the quality of segmentation. For example:
By combining our Lossy Compression-based clustering with techniques from sparse representation, we can make our motion segmentation method robust to these three kinds of pathological trajectories.
- A trajectory may correspond to certain nonrigid or random motions that do not obey the affine camera model (an outlying trajectory).
- Some of the features may be missing in some frames, causing a trajectory to have some missing entries (an incomplete trajectory).
- Even worse, some feature points may be mistracked (with the tracker unaware), causing a trajectory to have some entries with gross errors (a corrupted trajectory).
The Lossy Compression-based clustering method deals with outliers in an elegant fashion. In low dimensions a sufficient number of outliers will cover the entire space, and so the algorithm tends to group all outliers into a single group . Such a group can be easily detected, because the number of bits per vector in that group will be very large relative to other groups. However, in higher-dimensional spaces, such as in our motion segmentation problem, it would require an enormous number of outliers to fill the space. If outliers are thinly scattered in the ambient space, they will be most efficiently encoded when each outlier is its own group. Such small groups are also easily detectable.
Using techniques drawn from sparse representation, it is possible to fill in the missing entries in incomplete trajectories prior to segmentation.
Samples drawn from a low-dimensional subspace are self-expressive, meaning that a sample can be expressed in terms of a few other samples from the same subspace. More precisely, if the given sample is and is the data matrix whose columns are all of the other samples in the dataset, then there exists a coefficient vector that satisfies
This equation is highly underdetermined, as typically P >> D, and thus c is not unique. However, if y lies in a low-dimensional subspace S, it can be linearly represented by a few vectors from S. Hence, the c we seek should be sparse, having few nonzero entries. If c is sufficently sparse, then c is unique and can be recovered via the following linear program:
The sparse representation of y induced by c is highly robust. In particular, with probability one, c is preserved by an arbitrary orthogonal projection, provided the dimension of the projection is larger that the dimension of any of the subspaces. Now suppose y has some missing entries and Y is the subset of the data having no missing entries. Removing the rows of the missing entries from both y and Y is a projection, and so will preserve c if the number of missing entries is not too large. Thus we can obtain c in the projected space and use it in the above linear program to complete y.
Using techniques from sparse representation we can also detect and repair vectors with corrupted entries prior to segmentation.
A corrupted vector can be modeled as
where y is the uncorrupted vector, and is a vector that contains all of the gross errors. As long as there are enough uncorrupted vectors in the dataset, we can express y as a linear combination of the other vectors in the dataset as before. If is a matrix whose columns are the other vectors in the dataset, then this equation becomes
If the true c and e are sufficiently sparse, we can simultaneously find the sparsest c and e by solving the linear program :
Once w is computed, we decompose it into c and e, and use c to recover y.
Results on Hopkins 155 database
We tested the performance of our algorithm on the publicly available Hopkins 155 database. The results of these experiments are detailed below:
Results for each Experiment
Coming soon: Results for sequences with incomplete/corrupted trajectories.
Robust Motion Segmentation via Lossy Compression 1.0
The 'arm' sequence with projection down to R5
> addpath 'helpers';
> epsilon = logspace(-5,3,101);
> [rawData, trueLabels] = load_sequence('arm');
> processedData = process_sequence(rawData, true);
> result = try_sequence('arm', processedData, epsilon);
> computedLabels = find_best_segmentation(result, processedData, 2, epsilon);
> err = compare_labels(trueLabels, computedLabels);
Note: To use our L1-based methods for entry completion and error correction, the CVX package by Stephen Boyd must be installed and in the MATLAB path.
- Segmentation of Multivariate Mixed Data via Lossy Coding and Compression. Yi Ma, Harm Derksen, Wei Hong and John Wright. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2007.
- A Benchmark for the Comparison of 3-D Motion Segmentation Algorithms. Roberto Tron and Rene Vidal, Proceedings of the IEEE International Conference on Computer Vision and Pattern Recognition, 2007.
- Robust Face Recognition via Sparse Representation. John Wright, Allen Yang, Arvind Ganesh, S. Shankar Sastry, and Yi Ma. To appear in IEEE Transactions on Pattern Analysis and Machine Intelligence, 2008.
Comments? Questions? Contact Shankar Rao at: srrao AT UIUC DOT EDU