Additive Modeling of English F0 Contour for Speech Synthesis

Abstract

In this paper, we present an approach to fundamental frequency contour modeling of English for speech synthesis, based on a statistical learning technique called additive models that was successfully applied to the modeling of Japanese F/sub 0/ contours previously. In an attempt to model English F/sub 0/ contours, we defined a three-layer additive model consisting of an intonational phrase component, a word-level component representing lexical stress types, and a pitch-accent component related to accented syllables. These component functions are estimated simultaneously using a backfitting algorithm derived from a regularized least-squares error criterion specified on the model with regard to the training data. The proposed method was trained and tested using the widely used ToBI-labeled speech corpus and promising results were obtained.

References

Page 1

	Year	Citations

Page 1