Publication | Open Access
Video event detection and summarization using audio, visual and text saliency
70
Citations
10
References
2009
Year
Unknown Venue
Natural Language ProcessingImage AnalysisVisual SaliencyEngineeringMulti-modal SummarizationPattern RecognitionEye TrackingMultimedia AnalysisVideo SummarizationAudio SaliencyText SaliencyVideo Content AnalysisVideo UnderstandingImportant Video EventsVideo Event DetectionVideo RetrievalComputer VisionSpeech Recognition
Detection of perceptually important video events is formulated here on the basis of saliency models for the audio, visual and textual information conveyed in a video stream. Audio saliency is assessed by cues that quantify multifrequency waveform modulations, extracted through nonlinear operators and energy tracking. Visual saliency is measured through a spatiotemporal attention model driven by intensity, color and motion. Text saliency is extracted from part-of-speech tagging on the subtitles information available with most movie distributions. The various modality curves are integrated in a single attention curve, where the presence of an event may be signified in one or multiple domains. This multimodal saliency curve is the basis of a bottom-up video summarization algorithm, that refines results from unimodal or audiovisual-based skimming. The algorithm performs favorably for video summarization in terms of informativeness and enjoyability.
| Year | Citations | |
|---|---|---|
Page 1
Page 1