A summary of results is given in a video from our TRICTRAC project.
To track articulated objects through complex occlusions, we developed methods based on point distribution models. The idea is to abandon the idea of segmenting the tracked object, and rather focus on identifiable points. Point distribution models track a large number of points on the same target while exploiting the spatial configuration between them. This redundancy results in remarkable robustness. For the tracker to confuse different targets, the targets involved must simultaneously
While the first two conditions are rather common, the third often suffices to disambiguate different objects. Capitalizing on this is the key innovation of our method.
This following movie shows a spectacular example of tracking soccer teammates through two- and three-way mutual occlusions (Mathes & Piater, 2005).
Our original point distribution models are extraordinarily stable if the set of points being tracked does not change too much over time. This condition is often violated in practice, e.g., if a tracked object turns in depth. We therefore developed adaptive point distribution models that are able to update the set of tracked targets on the fly.
The above movie shows an example of tracking several people through a highly dynamic scene. At the very end, Target 2 is incorrectly merged and Target 3 is lost because the targets undergo strong configurational changes during occlusions. All other targets are correctly tracked through considerable configurational and scale changes as well as strong mutual occlusions (Mathes & Piater, 2006).
The soccer ball is very small in the image, moves fast and changes direction very abruptly. We conclude that it requires specific techniques to track it. A student did a pilot study using an intelligent combination of different techniques, with remarkable results.
We estimate on the fly the external camera calibration with respect to the soccer field by a combination of line tracking and visual odometry (Hayet, Piater & Verly, 2004). See a video of a funny illustration of this principle (here without visual odometry or temporal filtering, thus some residual jitter).