Concepedia

Abstract

Speech recognition of conversational speech is a difficult task. The performance levels on the Switchboard corpus had been in the vicinity of 70% word error rate. In this paper, we describe the results of applying a variety of modifications to our speech recognition system and we show their impact on improving the performance on conversational speech. These modifications include the use of more complex models, trigram language models, and cross-word triphone models. We also show the effect of using additional acoustic training on the recognition performance. Finally, we present an approach to dealing with the abundance of short words, and examine how the variable speaking rate found in conversational speech impacts on the performance. Currently, the level of performance is at the vicinity of 50% error, a significant improvement over recent levels.

References

YearCitations

Page 1