TD-PSOLA versus harmonic plus noise model in diphone based speech synthesis

Abstract

In an effort to select a speech representation for our next generation concatenative text-to-speech synthesizer, the use of two candidates is investigated; TD-PSOLA and the harmonic plus noise model, HNM. A formal listening test has been conducted and the two candidates have been rated regarding intelligibility, naturalness and pleasantness. Ability for database compression and computational load is also discussed. The results show that HNM consistently outperforms TD-PSOLA in all the above features except for computational load. HNM allows for high-quality speech synthesis without smoothing problems at the segmental boundaries and without buzziness or other oddities observed with TD-PSOLA.

References

Page 1

	Year	Citations

Page 1