Publication | Closed Access
On the Analysis of Training Data for Wavenet-Based Speech Synthesis
13
Citations
16
References
2018
Year
Unknown Venue
EngineeringMachine LearningWavenet ModelPhonologySpeech RecognitionData SciencePhoneticsWavenet-based Speech SynthesisRobust Speech RecognitionVoice RecognitionLanguage StudiesSpeech SynthesisSpeech OutputSound SynthesisComputer ScienceWavenet TrainsSpeech CommunicationSpeech TechnologyArtificial NoiseSpeech ProcessingSpeech InputSpeech Perception
In this paper, we analyze how much, how consistent and how accurate data WaveNet-based speech synthesis method needs to be able to generate speech of good quality. We do this by adding artificial noise to the description of our training data and observing how well WaveNet trains and produces speech. More specifically, we add noise to both phonetic segmentation and annotation accuracy, and we also reduce the size of training data by using a fewer number of sentences during training of a WaveNet model. We conducted MUSHRA listening tests and used objective measures to track speech quality within the conducted experiments. We show that WaveNet retains high quality even after adding a small amount of noise (up to 10%) to phonetic segmentation and annotation. A small degradation of speech quality was observed for our WaveNet configuration when only 3 hours of training data were used.
| Year | Citations | |
|---|---|---|
Page 1
Page 1