Publication | Closed Access
An acoustic segment modeling approach to query-by-example spoken term detection
59
Citations
16
References
2012
Year
Unknown Venue
EngineeringSpeech CorpusSpoken Language ProcessingAcoustic Segment ModelsCorpus LinguisticsSpeech RecognitionNatural Language ProcessingRobust Speech RecognitionHealth SciencesPosteriorgram-based Template MatchingAcoustic SegmentComputer ScienceSpeech CommunicationSpeech TechnologyAudio MiningSpeech ProcessingSpeech InputSpeech PerceptionLinguisticsQuery Term
The framework of posteriorgram-based template matching has been shown to be successful for query-by-example spoken term detection (STD). This framework employs a tokenizer to convert query examples and test utterances into frame-level posteriorgrams, and applies dynamic time warping to match the query posteriorgrams with test posteriorgrams to locate possible occurrences of the query term. It is not trivial to design a reliable tokenizer due to heterogeneous test conditions and the limitation of training resources. This paper presents a study of using acoustic segment models (ASMs) as the tokenizer. ASMs can be obtained following an unsupervised iterative procedure without any training transcriptions. The STD performance of the ASM tokenizer is evaluated on Fisher Corpus with comparison to three alternative tokenizers. Experimental results show that the ASM tokenizer outperforms a conventional GMM tokenizer and a language-mismatched phoneme recognizer. In addition, the performance is significantly improved by applying unsupervised speaker normalization techniques.
| Year | Citations | |
|---|---|---|
Page 1
Page 1