Publication | Closed Access
Product of Experts for Statistical Parametric Speech Synthesis
49
Citations
36
References
2011
Year
EngineeringMachine LearningSpeech RecognitionData ScienceRobust Speech RecognitionMachine TranslationHealth SciencesGaussian ExpertsPoe FrameworkSpeech SynthesisLinguisticsSpeech OutputDistant Speech RecognitionSpeech CommunicationSpeech TechnologySpeech ProcessingSpeech InputSpeech PerceptionMultiple Acoustic Models
Multiple acoustic models are often combined in statistical parametric speech synthesis. Both linear and non-linear functions of an observation sequence are used as features to be modeled. This paper shows that this combination of multiple acoustic models can be expressed as a product of experts (PoE); the likelihoods from the models are scaled, multiplied together, and then normalized. Normally these models are individually trained and only combined at the synthesis stage. This paper discusses a more consistent PoE framework where the models are jointly trained. A training algorithm for PoEs based on linear feature functions and Gaussian experts is derived by generalizing the training algorithm for trajectory HMMs. However for non-linear feature functions or non-Gaussian experts this is not possible, so a scheme based on contrastive divergence learning is described. Experimental results show that the PoE framework provides both a mathematically elegant way to train multiple acoustic models jointly and significant improvements in the quality of the synthesized speech.
| Year | Citations | |
|---|---|---|
Page 1
Page 1