Concepedia

Publication | Closed Access

Vocabulary independent spoken term detection

185

Citations

26

References

2007

Year

TLDR

Retrieving information from speech data typically relies on large‑vocabulary ASR word transcripts, but queries outside the recognizer’s vocabulary are missed, and phonetic transcripts, though available, have lower accuracy. The authors aim to develop a vocabulary‑independent spoken term detection system that can handle arbitrary queries by combining word transcripts and phonetic transcripts. The system builds word confusion networks and phonetic lattices from ASR output, indexes them, and uses these structures for query processing and ranking. The system achieved the highest overall ranking for US English speech data in the recent NIST Spoken Term Detection evaluation, demonstrating its effectiveness.

Abstract

We are interested in retrieving information from speech data like broadcast news, telephone conversations and roundtable meetings. Today, most systems use large vocabulary continuous speech recognition tools to produce word transcripts; the transcripts are indexed and query terms are retrieved from the index. However, query terms that are not part of the recognizer's vocabulary cannot be retrieved, and the recall of the search is affected. In addition to the output word transcript, advanced systems provide also phonetic transcripts, against which query terms can be matched phonetically. Such phonetic transcripts suffer from lower accuracy and cannot be an alternative to word transcripts.We present a vocabulary independent system that can handle arbitrary queries, exploiting the information provided by having both word transcripts and phonetic transcripts. A speech recognizer generates word confusion networks and phonetic lattices. The transcripts are indexed for query processing and ranking purpose.The value of the proposed method is demonstrated by the relative high performance ofour system, which received the highest overall ranking for US English speech data in the recent NIST Spoken Term Detection evaluation.

References

YearCitations

Page 1