Publication | Closed Access
A robust parser for spoken language understanding
655
Citations
15
References
1999
Year
Unknown Venue
Syntactic ParsingEngineeringPhonologySpeech RecognitionNatural Language ProcessingSyntaxComputational LinguisticsPhoneticsState DurationNatural-sounding SpeechRobust Speech RecognitionGrammarVoice RecognitionLanguage StudiesSpoken Language UnderstandingSpeech SynthesisSpeech OutputComputer ScienceShallow ParsingSpeech CommunicationSpeech TechnologyRobust ParserPitch ParameterParsingTreebanksSpeech ProcessingSpeech PerceptionLinguistics
Describe an HMM‑based speech synthesis system that models spectrum, pitch, and state duration simultaneously in a unified HMM framework. The system models pitch and state duration with multi‑space probability distribution HMMs and multi‑dimensional Gaussian distributions, clusters spectral, pitch, and duration distributions independently using decision‑tree context clustering, and generates synthetic speech via HMM‑based parameter generation and mel‑cepstrum vocoding. Informal listening tests confirm that the system produces natural‑sounding speech resembling the training speaker.
In this paper, we describe an HMM-based speech synthesis system in which spectrum, pitch and state duration are modeled simultaneously in a unified framework of HMM. In the system, pitch and state duration are modeled by multi-space probability distribution HMMs and multi-dimensional Gaussian distributions, respectively. The distributions for spectral parameter, pitch parameter and the state duration are clustered independently by using a decision-tree based context clustering technique. Synthetic speech is generated by using an speech parameter generation algorithm from HMM and a mel-cepstrum based vocoding technique. Through informal listening tests, we have confirmed that the proposed system successfully synthesizes natural-sounding speech which resembles the speaker in the training database.
| Year | Citations | |
|---|---|---|
Page 1
Page 1