Publication | Open Access
Using conditional random fields for sentence boundary detection in speech
107
Citations
13
References
2005
Year
Unknown Venue
EngineeringSpeech CorpusConditional Random FieldSpoken Language ProcessingCorpus LinguisticsText MiningSpeech RecognitionNatural Language ProcessingData ScienceText SegmentationHidden Markov ModelComputational LinguisticsGrammarVoice RecognitionLanguage StudiesMachine TranslationSentence Boundary DetectionConditional Random FieldsSpeech CommunicationSpeech AnalysisSpeech ProcessingSpeech InputSpeech PerceptionLinguisticsPo Tagging
Sentence boundary detection in speech is crucial for improving readability of speech recognition output and enabling downstream processing, and prior work has used HMM and Maxent classifiers that combine textual and prosodic cues. The study evaluates a conditional random field for sentence boundary detection and compares its performance to earlier HMM and Maxent models. The CRF was trained and tested on conversational telephone and broadcast news corpora, using both human transcriptions and automatic speech recognition output, building on earlier HMM and Maxent classifiers that combined textual and prosodic features. The CRF achieved a lower error rate than the HMM and Maxent models on the NIST sentence boundary detection task, with the best performance obtained by a three‑way voting ensemble of the classifiers.
Sentence boundary detection in speech is important for enriching speech recognition output, making it easier for humans to read and downstream modules to process. In previous work, we have developed hidden Markov model (HMM) and maximum entropy (Maxent) classifiers that integrate textual and prosodic knowledge sources for detecting sentence boundaries. In this paper, we evaluate the use of a conditional random field (CRF) for this task and relate results with this model to our prior work. We evaluate across two corpora (conversational telephone speech and broadcast news speech) on both human transcriptions and speech recognition output. In general, our CRF model yields a lower error rate than the HMM and Maxent models on the NIST sentence boundary detection task in speech, although it is interesting to note that the best results are achieved by three-way voting among the classifiers. This probably occurs because each model has different strengths and weaknesses for modeling the knowledge sources.
| Year | Citations | |
|---|---|---|
Page 1
Page 1