Publication | Closed Access
Spoken document retrieval from call-center conversations
76
Citations
17
References
2006
Year
Unknown Venue
Spoken Document RetrievalEngineeringWord Error RateSpoken Language ProcessingSpoken Dialog SystemCorpus LinguisticsSpeech RecognitionNatural Language ProcessingInformation RetrievalData ScienceComputational LinguisticsRobust Speech RecognitionConversation AnalysisVoice RecognitionConversational Speech CorporaLanguage StudiesReal-time LanguageMachine TranslationSpeech CommunicationWord LatticesSpeech ProcessingSpeech InputSpeech PerceptionLinguistics
Retrieving information from conversational speech corpora, such as call‑center data, is difficult because spontaneous speech and low recording quality lead to ASR word error rates of 30 % or higher. The study aims to improve retrieval performance by exploiting word confusion networks. We analyze how increasing ASR word error rate affects search effectiveness. Using WCNs improves MAP over raw transcripts, and MAP remains reasonable even at very high error rates.
We are interested in retrieving information from conversational speech corpora, such as call-center data. This data comprises spontaneous speech conversations with low recording quality, which makes automatic speech recognition (ASR) a highly difficult task. For typical call-center data, even state-of-the-art large vocabulary continuous speech recognition systems produce a transcript with word error rate of 30% or higher. In addition to the output transcript, advanced systems provide word confusion networks (WCNs), a compact representation of word lattices associating each word hypothesis with its posterior probability. Our work exploits the information provided by WCNs in order to improve retrieval performance. In this paper, we show that the mean average precision (MAP) is improved using WCNs compared to the raw word transcripts. Finally, we analyze the effect of increasing ASR word error rate on search effectiveness. We show that MAP is still reasonable even under extremely high error rate.
| Year | Citations | |
|---|---|---|
Page 1
Page 1