Publication | Closed Access
SailAlign: Robust long speech-text alignment
103
Citations
13
References
2011
Year
Unknown Venue
EngineeringLong ChunksCorpus LinguisticsSpeech RecognitionNatural Language ProcessingComputational LinguisticsPhoneticsRobust Speech RecognitionVoice RecognitionLanguage StudiesSpeech-text AlignmentMachine TranslationSpeech SynthesisComputer ScienceSpeech CommunicationSpeech TechnologyRead SpeechSpeech ProcessingSpeech InputSpeech PerceptionLinguistics
Long speech-text alignment can facilitate large-scale stu dy of rich spoken language resources that have recently become widely accessible, e.g., collections of audio books, or mul timedia documents. For such resources, the conventional Viterbibased forced alignment may often be proven inadequate mainly due to mismatched audio and text and/or noisy audio. In this paper, we present SailAlign which is an open-source software toolkit for robust long speech-text alignment that circumvents these restrictions. It implements an adaptive, iterative s peech recognition and text alignment scheme that allows for the processing of very long (and possibly noisy) audio and is robust to transcription errors. SailAlign is evaluated on artificial ly created long chunks of the TIMIT database. Audio is artificially contaminated with babble noise, and the corresponding transcriptions are corrupted at various levels. We present the c orresponding word boundary detection results. Finally, we demonstrate the potential use of the software for the exploitatio n of audio books for the study of read speech. Index Terms: speech-text alignment, open-source, software, imperfect transcriptions, adaptation, audio-books
| Year | Citations | |
|---|---|---|
Page 1
Page 1