Publication | Closed Access
Automatic transcription of Czech, Russian, and Slovak spontaneous speech in the MALACH project
27
Citations
4
References
2005
Year
Unknown Venue
EngineeringSpeech CorpusSpoken Language ProcessingPhonologyLanguage ProcessingText MiningSpeech RecognitionNatural Language ProcessingLanguage DocumentationMalach ProjectPhoneticsComputational LinguisticsAutomatic RecognitionCorpus AnalysisLanguage StudiesMachine TranslationSpeech SynthesisAutomatic TranscriptionLanguage TechnologySpeech OutputSlovak Spontaneous SpeechSpeech CommunicationSpeech TechnologySlovak WitnessesSpeech AnalysisBuilding Lvcsr SystemsSpeech AcousticsLanguage RecognitionSpeech ProcessingSpeech PerceptionStandardized LexiconLinguistics
This paper describes the 3.5-years effort put into building LVCSR systems for recognition of spontaneous speech of Czech, Russian, and Slovak witnesses of the Holocaust in the MALACH project. For processing of colloquial, highly emotional and heavily accented speech of elderly people containing many non-speech events we have developed techniques that very effectively handle both non-speech events and colloquial and accented variants of uttered words. Manual transcripts as one of the main sources for language modeling were automatically „normalized” using standardized lexicon, which brought about 2 to 3% reduction of the word error rate (WER). The subsequent interpolation of such LMs with models built from an additional collection (consisting of topically selected sentences from general text corpora) resulted into an additional improvement of performance of up to 3 % .
| Year | Citations | |
|---|---|---|
Page 1
Page 1