Video-Based Object Recognition Using Novel Set-of-Sets Representations

Abstract

We address the problem of object recognition in egocentric videos, where a user arbitrarily moves a mobile camera around an unknown object. Using a video that captures variation in an object's appearance owing to camera motion (more viewpoints, scales, clutter and lighting conditions), can accumulate evidence and improve object recognition accuracy. Most previous work has taken a single image as input, or tackled a video simply by a collection i.e. sum of frame-based recognition scores. In this paper, beyond frame-based recognition, we propose two novel set-of-sets representations of a video sequence for object recognition. We combine the techniques of bag of words for a set of data spatially distributed thus heterogeneous, and manifold for a set of data temporally smooth and homogeneous, to construct the two proposed set-of-sets representations. We also propose methods to perform matching for the two representations respectively. The representations and matching techniques are evaluated on our video-based object recognition datasets, which contain 830 videos of ten objects and four environmental variations. The experiments on the challenging new datasets show that our proposed solution significantly outperforms the traditional frame-based methods.

References

Page 1

	Year	Citations

Page 1