Concepedia

TLDR

The new AT&T TTS system for general U.S. English text integrates components from AT&T Flextalk, Edinburgh’s Festival, and ATR’s CHATR. It combines Flextalk’s text normalization, letter‑to‑sound, and optional baseline prosody with Festival’s modular infrastructure, and a modified CHATR unit‑selection that uses system‑generated prosody as a target, guided by formal subjective evaluations and a Harmonic‑plus‑Noise representation.

Abstract

The new AT&T TTS system for general U.S. English text is based on best-choice components picked from the AT&T Flextalk TTS, the Festival System from the University of Edinburgh, and ATR’s CHATR system. From Flextalk, it employs text normalization, letter-to-sound, and (optionally) baseline prosody generation. Festival provides general software-engineering infrastructure (modularity) for easy experimentation and competitive evaluation of different algorithms or modules. Finally, CHATR’s unit selection was modified to guarantee the intelligibility of a good n-phone (n=2 would be diphone) synthesizer while improving significantly on perceived naturalness relative to Flextalk. Each decision made during the research and development phase of this system was based on formal subjective evaluations. For example, the best voice found in a test that compared TTS systems built from several speakers gave a 0.3-point head start (on a 5-point rating scale) in quality over the mean of all speakers. Similarly, using our Harmonic-plus-Noise speech representation gave us a 0.25-point advantage over standard TD-PSOLA. Finally, not performing prosodic modifications (other than some smoothing across concatenation points) on the units but using the system-generated prosody as a target in unit selection, 0.4 points were gained on overall quality. In conclusion, the new system combines the best of rule-based and data-driven worlds in TTS technology to deliver on the long-standing promise of truly natural-sounding synthesis.

References

YearCitations

1999

64

1998

42

2002

37

1997

34

1997

30

1998

26

1998

25

1998

17

1998

15

Page 1