The SRI/OGI 2006 spoken term detection system

TLDR

The paper presents the SRI/OGI system for the 2006 NIST Spoken Term Detection evaluation and examines tradeoffs between performance and system design. The system comprises an offline audio‑indexing phase that converts speech into a searchable word‑based index using SRI’s large‑vocabulary STT, followed by an online term‑retrieval phase that ranks occurrences, and the authors evaluate indexing speed, ranking schemes, and OOV handling across three English genres.

Abstract

This paper describes the system developed jointly at SRI and OGI for participation in the 2006 NIST Spoken Term Detection (STD) evaluation. We participated in the three genres of the English track: Broadcast News (BN), Conversational Telephone Speech (CTS), and Conference Meetings (MTG). The system consists of two phases. First, audio indexing, an offline phase, converts the input speech waveform into a searchable index. Second, term retrieval, possibly an online phase, returns a ranked list of occurrences for each search term. We used a word-based indexing approach, obtained with SRI’s large vocabulary Speech-to-Text (STT) system. Apart from describing the submitted system and its performance on the NIST evaluation metric, we study the tradeoffs between performance and system design. We examine performance versus indexing speed, effectiveness of different index ranking schemes on the NIST score, and the utility of approaches to deal with out-of-vocabulary (OOV) terms.

References

Page 1

	Year	Citations

Page 1