Concepedia

Publication | Open Access

WORLD: A Vocoder-Based High-Quality Speech Synthesis System for Real-Time Applications

1.1K

Citations

29

References

2016

Year

TLDR

WORLD is a vocoder‑based speech synthesis system developed to improve sound quality for real‑time applications, addressing the challenge that existing high‑quality systems are computationally expensive and difficult to run in real time, while vocoders are widely used in speech research. The system aims to deliver both high sound quality and rapid processing for real‑time speech synthesis. WORLD employs three analysis algorithms and one synthesis algorithm from prior work, and its performance was evaluated by comparing synthesized speech—including consonants—to natural speech and by measuring processing speed against conventional systems. The system outperformed existing solutions in both sound quality and processing speed, achieving more than tenfold faster performance and a real‑time factor confirming suitability for real‑time use.

Abstract

A vocoder-based speech synthesis system, named WORLD, was developed in an effort to improve the sound quality of real-time applications using speech. Speech analysis, manipulation, and synthesis on the basis of vocoders are used in various kinds of speech research. Although several high-quality speech synthesis systems have been developed, real-time processing has been difficult with them because of their high computational costs. This new speech synthesis system has not only sound quality but also quick processing. It consists of three analysis algorithms and one synthesis algorithm proposed in our previous research. The effectiveness of the system was evaluated by comparing its output with against natural speech including consonants. Its processing speed was also compared with those of conventional systems. The results showed that WORLD was superior to the other systems in terms of both sound quality and processing speed. In particular, it was over ten times faster than the conventional systems, and the real time factor (RTF) indicated that it was fast enough for real-time processing.

References

YearCitations

Page 1