Publication | Closed Access
Data Fusion for Visual Tracking With Particles
466
Citations
51
References
2004
Year
Machine VisionImage AnalysisData ScienceParticle FilteringPattern RecognitionVisual TrackingData FusionEye TrackingEngineeringMulti-sensor Information FusionTracking SystemObject TrackingStereo SoundComputer ScienceMoving Object TrackingVisual SurveillanceComputer Vision
Particle filtering has transformed probabilistic object tracking by enabling propagation of non‑Gaussian distributions, which is especially valuable in visually ambiguous scenes and permits principled fusion of multiple measurement sources, a capability that has yet to be fully exploited in visual tracking. The study introduces generic importance‑sampling mechanisms for fusing color with stereo sound in teleconferencing or with motion in still‑camera surveillance. Each cue is modeled with a specific likelihood function, and intermittent cues such as sound or motion are handled by generating proposal distributions from their likelihoods. Effective fusion of the cues via particle filtering is demonstrated on real teleconference and surveillance data.
The effectiveness of probabilistic tracking of objects in image sequences has been revolutionized by the development of particle filtering. Whereas Kalman filters are restricted to Gaussian distributions, particle filters can propagate more general distributions, albeit only approximately. This is of particular benefit in visual tracking because of the inherent ambiguity of the visual world that stems from its richness and complexity. One important advantage of the particle filtering framework is that it allows the information from different measurement sources to be fused in a principled manner. Although this fact has been acknowledged before, it has not been fully exploited within a visual tracking context. Here we introduce generic importance sampling mechanisms for data fusion and discuss them for fusing color with either stereo sound, for teleconferencing, or with motion, for surveillance with a still camera. We show how each of the three cues can be modeled by an appropriate data likelihood function, and how the intermittent cues (sound or motion) are best handled by generating proposal distributions from their likelihood functions. Finally, the effective fusion of the cues by particle filtering is demonstrated on real teleconference and surveillance data.
| Year | Citations | |
|---|---|---|
Page 1
Page 1