SailAlign: Robust long speech-text alignment

Abstract

Long speech-text alignment can facilitate large-scale stu dy of rich spoken language resources that have recently become widely accessible, e.g., collections of audio books, or mul timedia documents. For such resources, the conventional Viterbibased forced alignment may often be proven inadequate mainly due to mismatched audio and text and/or noisy audio. In this paper, we present SailAlign which is an open-source software toolkit for robust long speech-text alignment that circumvents these restrictions. It implements an adaptive, iterative s peech recognition and text alignment scheme that allows for the processing of very long (and possibly noisy) audio and is robust to transcription errors. SailAlign is evaluated on artificial ly created long chunks of the TIMIT database. Audio is artificially contaminated with babble noise, and the corresponding transcriptions are corrupted at various levels. We present the c orresponding word boundary detection results. Finally, we demonstrate the potential use of the software for the exploitatio n of audio books for the study of read speech. Index Terms: speech-text alignment, open-source, software, imperfect transcriptions, adaptation, audio-books

References

Page 1

	Year	Citations

Page 1