Voice characteristics conversion for HMM-based speech synthesis system

Abstract

We describe an approach to voice characteristics conversion for an HMM-based text-to-speech synthesis system. Since this speech synthesis system uses phoneme HMMs as speech units, voice characteristics conversion is achieved by changing the HMM parameters appropriately. To transform the voice characteristics of synthesized speech to the target speaker, we applied the maximum a posteriori estimation and vector field smoothing (MAP/VFS) algorithm to the phoneme HMMs. Using 5 or 8 sentences as adaptation data, speech samples synthesized from a set of adapted tied triphone HMMs, which have approximately 2,000 distributions, are judged to be closer to the target speaker by 79.7% or 90.6%, respectively, in an ABX listening test.

References

Page 1

	Year	Citations

Page 1