Publication | Open Access
Towards robust features for classifying audio in the CueVideo system
61
Citations
15
References
1999
Year
Unknown Venue
MusicTowards Robust FeaturesEngineeringVideo ProcessingMultimedia AnalysisMixed AudioRobust FeatureSpeech RecognitionNatural Language ProcessingImage AnalysisPattern RecognitionPhoneticsAudio AnalysisMultimedia MiningHealth SciencesStory BoundariesMachine VisionCuevideo SystemAudio RetrievalComputer VisionAudio MiningSpeech ProcessingSpeech Perception
The role of audio in the context of multimedia applications involving video is becoming increasingly important. Many efforts in this area focus on audio data that contains some built-in semantic information structure such as in broadcast news, or focus on classification of audio that contains a single type of sound such as cleaar speech or clear music only. In the CueVideo system, we detect and classify audio that consists of mixed audio, i.e. combinations of speech and music together with other types of background sounds. Segmentation of mixed audio has applications in detection of story boundaries in video, spoken document retrieval systems, audio retrieval systems etc. We modify and combine audio features known to be effective in distinguishing speech from music, and examine their behavior on mixed audio. Our preliminary experimental results show that we can achieve a classification accuracy of over 80% for such mixed audio. Our study also provides us with several helpful insights related to analyzing mixed audio in the context of real applications.
| Year | Citations | |
|---|---|---|
Page 1
Page 1