HMM based structuring of tennis videos using visual and audio cues

Abstract

This paper focuses on the use of hidden Markov models (HMMs) for structure analysis of videos, and demonstrates how they can be efficiently applied to merge audio and visual cues. Our approach is validated in the particular domain of tennis videos. The basic temporal unit is the video shot. Visual features describe the audio events within a video shot. The video structure parsing relies on the analysis of the temporal interleaving of video shots, with respect to prior information about tennis content and editing rules. As a result, typical tennis scenes are identified. In addition, each shot is assigned to a level in the hierarchy described in terms of point, game and set.

References

Page 1

	Year	Citations

Page 1