Multi-sensor fusion for sport genre classification of user generated mobile videos

Abstract

We present a robust multimodal approach for classifying the sport genre in videos recorded by mobile phone users at a sport event. In addition to traditional audio-visual content analysis tools, we propose to analyze auxiliary sensor data (electronic compass data and accelerometer data) captured simultaneously with the video recording. By means of machine learning techniques, we build models of visual appearance, camera motion (from auxiliary sensor data) and audio scene, which are used for classifying the data from each modality. The sport genre is obtained by fusing the information provided by the models. We propose to use the quality of each modality as an indication of its reliability. Extensive experiments were performed on real test data collected at public sport events. We provide comparisons on the use of different modality sets and fusion methods. Finally, we show how the proposed methods achieve robust classification even in the considered unconstrained scenarios.

References

Page 1

	Year	Citations

Page 1