Concepedia

TLDR

Describe an HMM‑based speech synthesis system that models spectrum, pitch, and state duration simultaneously in a unified HMM framework. The system models pitch and state duration with multi‑space probability distribution HMMs and multi‑dimensional Gaussian distributions, clusters spectral, pitch, and duration distributions independently using decision‑tree context clustering, and generates synthetic speech via HMM‑based parameter generation and mel‑cepstrum vocoding. Informal listening tests confirm that the system produces natural‑sounding speech resembling the training speaker.

Abstract

In this paper, we describe an HMM-based speech synthesis system in which spectrum, pitch and state duration are modeled simultaneously in a unified framework of HMM. In the system, pitch and state duration are modeled by multi-space probability distribution HMMs and multi-dimensional Gaussian distributions, respectively. The distributions for spectral parameter, pitch parameter and the state duration are clustered independently by using a decision-tree based context clustering technique. Synthetic speech is generated by using an speech parameter generation algorithm from HMM and a mel-cepstrum based vocoding technique. Through informal listening tests, we have confirmed that the proposed system successfully synthesizes natural-sounding speech which resembles the speaker in the training database.

References

YearCitations

Page 1