A hidden Markov model framework for video segmentation using audio and image features

TLDR

Typical video segmentation algorithms classify shot boundaries by computing an image-based distance between adjacent frames and comparing this distance to fixed, manually determined thresholds, while motion and audio information are used separately. This paper proposes a technique for segmenting video using hidden Markov models (HMM). The method segments video into shots, shot boundaries, and intra‑shot camera movements, using image, audio, and motion features combined within an HMM framework, eliminating the need for manual thresholds. Testing on a video database demonstrated that the algorithm improves segmentation accuracy compared to standard threshold‑based systems.

Abstract

This paper describes a technique for segmenting video using hidden Markov models (HMM). Video is segmented into regions defined by shots, shot boundaries, and camera movement within shots. Features for segmentation include an image-based distance between adjacent video frames, an audio distance based on the acoustic difference in intervals just before and after the frames, and an estimate of motion between the two frames. Typical video segmentation algorithms classify shot boundaries by computing an image-based distance between adjacent frames and comparing this distance to fixed, manually determined thresholds. Motion and audio information is used separately. In contrast, our segmentation technique allows features to be combined within the HMM framework. Further, thresholds are not required since automatically trained HMMs take their place. This algorithm has been tested on a video data base, and has been shown to improve the accuracy of video segmentation over standard threshold-based systems.

References

Page 1

	Year	Citations

Page 1