Evaluation of lexical models for Hungarian Broadcast speech transcription and spoken term detection

Abstract

In this paper, we re-evaluate morph (data-driven subword) and word lexical models used for large vocabulary continuous speech recognition of agglutinative languages. Since such speech recognition systems are applied mostly for information retrieval purposes we use evaluation metrics accordingly. Standard 3-gram language model with one million words vocabulary is used for words whereas statistical morph-based models are applied with smaller vocabularies and with higher order of n-gram models. Fostering real life applicability, the computational time and memory usage of the various approaches is kept below real-time and 1.5 GB, respectively. The lexical modeling approaches are tested on Hungarian Broadcast News and Broadcast Conversation speech. In our setup, although word-based models outperformed morph-based ones in terms of both word error rate and spoken term detection measures, a search-cascade of the word and morph approaches improved the latter results significantly.

References

Page 1

	Year	Citations

Page 1