Publication | Closed Access
Spoken Term Detection for Turkish Broadcast News
79
Citations
12
References
2008
Year
Unknown Venue
EngineeringSpeech CorpusSpoken Language ProcessingCorpus LinguisticsOov QueriesText MiningSpeech RecognitionNatural Language ProcessingInformation RetrievalComputational LinguisticsRobust Speech RecognitionLanguage StudiesMachine TranslationSpoken Term DetectionSpeech CommunicationLanguage RecognitionSpeech ProcessingSpeech InputLinguisticsTerm Specific ThresholdsTerm Detection
In this paper, we present a baseline spoken term detection (STD) system for Turkish broadcast news. The agglutinative structure of Turkish causes a high out-of-vocabulary (OOV) rate and increases word error rate (WER) in automatic speech recognition. Several approaches are attempted to reduce this negative effect on the STD system. Sub-word units are used to handle the OOV queries and lattice-based indexing is used to obtain different operating points and handle high WER cases. A recently proposed method for setting term specific thresholds is also evaluated and extended to allow us to choose an operating point suitable for our needs. Best results are obtained by using a cascade of word and sub-word lattice indices with term-thresholding.
| Year | Citations | |
|---|---|---|
Page 1
Page 1