Publication | Closed Access
Speech synthesis using HMMs with dynamic features
128
Citations
9
References
2002
Year
Unknown Venue
EngineeringPhonologySpeech RecognitionPhoneticsRobust Speech RecognitionSentence HmmLanguage StudiesSpeech SynthesisLinguisticsSpeech OutputComputer ScienceText-to-speechSpeech SignalSpeech CommunicationSpeech TechnologySpeech ProcessingSpeech InputSpeech PerceptionTriphone Hmms
This paper presents a new text-to-speech synthesis system based on HMM which includes dynamic features, i.e., delta and delta-delta parameters of speech. The system uses triphone HMMs as the synthesis units. The triphone HMMs share less than 2,000 clustered states, each of which is modelled by a single Gaussian distribution. For a given text to be synthesized, a sentence HMM is constructed by concatenating the triphone HMMs. Speech parameters are generated from the sentence HMM in such a way that the output probability is maximized. The speech signal is synthesized directly from the obtained parameters using the mel log spectral approximation (MLSA) filter. Without dynamic features, the discontinuity of the generated speech spectra causes glitches in the synthesized speech. On the other hand, with dynamic features, the synthesized speech becomes quite smooth and natural even if the number of clustered states is small.
| Year | Citations | |
|---|---|---|
Page 1
Page 1