Publication | Closed Access
A factor automaton approach for the forced alignment of long speech recordings
38
Citations
9
References
2009
Year
Unknown Venue
EngineeringMachine LearningSpoken Language ProcessingLarge Language ModelPhonologyLong Speech RecordingsCorpus LinguisticsText MiningSpeech RecognitionNatural Language ProcessingFactor AutomatonComputational LinguisticsPhoneticsRobust Speech RecognitionConstrained Language ModelVoice RecognitionLanguage StudiesReal-time LanguageMachine TranslationSequence ModellingSpeech SynthesisLinguisticsComputer ScienceFactor Automaton ApproachSignal ProcessingSpeech CommunicationForced AlignmentSpeech TechnologySpeech ProcessingSpeech PerceptionSpeech Translation
This paper addresses the problem of aligning long speech recordings to their transcripts. Previous work has focused on using highly tuned language models trained on the transcripts to reduce the search space. In this paper we propose the use of a factor automaton, a well known method to represent all substrings from a string. This automaton encodes a highly constrained language model trained on the transcripts. We show competitive results with n-gram models in several testing scenarios. Preliminary experiments show perfect alignments at a reduced computational load and with a smaller memory footprint when compared to n-gram models.
| Year | Citations | |
|---|---|---|
Page 1
Page 1