Optimal data selection for unit selection synthesis.

Abstract

In this work, we address the issue of creating a set of utterances with optimal coverage for reliable, high quality concatenative synthesis, whether for general synthesis or domain synthesis. We present an automatic method that takes into account the acoustic distinctions made by a particular speaker and selects prompts from large databases of typical utterances. A general unit selection text-to-speech system created by this process can synthesize any input text, but the output is best for content intended to be similar to that in the database in terms of style, delivery, and coverage. 1. Background Unit selection synthesis, where appropriate sub-word units are selected from databases of natural speech, seems to hold the promise of high quality natural sounding speech synthesis. However, the quality of such systems is inherently related to the quality and appropriateness of the database from which the units are selected. In the extreme case, it has been shown [2] that if the databas...

References

Page 1

	Year	Citations
Classification and Regression Trees. Alexander Gordon, Leo Breiman, Jerome H. Friedman, Biometrics Data Analysis MethodEngineeringMachine LearningData ScienceData Mining	1984	23.8K
Classification and Regression Trees. John Van Ryzin, Leo Breiman, Jerome H. Friedman, Journal of the American Statistical Association Data ClassificationEngineeringMachine LearningData ScienceData Mining	1986	21K
Unit selection in a concatenative speech synthesis system using a large speech database Andrew J. Hunt, Alan W. Black EngineeringMachine LearningSpoken Language ProcessingPhonologySpeech Recognition	2002	1.2K
Automatically clustering similar units for unit selection in speech synthesis Alan W. Black, Paul Taylor EngineeringSpoken Language ProcessingPhonologySpeech RecognitionNatural Language Processing	1997	282
Methods for optimal text selection Jan P. H. van Santen, Adam L. Buchsbaum Optimal Text SelectionEngineeringCorpus LinguisticsText MiningSpeech Recognition	1997	97
Task and Domain Specific Modeling in the Carnegie Mellon Communicator System Alexander I. Rudnicky, Christina L. Bennett, Alan W. Black, Figshare	2012	45
Improvements in an HMM-based speech synthesiser Robert E. Donovan, Philip C. Woodland Health SciencesPhoneticsSpeech SynthesisSpeech OutputSpeech Processing	1995	31
Diphone collection and synthesis Kevin Lenzo, Alan W. Black MusicSpeech SynthesisDiphone CollectionSpeech ProcessingSound Synthesis	2000	24

Page 1