Publication | Closed Access
Lombard speech synthesis using long short-term memory recurrent neural networks
12
Citations
28
References
2017
Year
Unknown Venue
EngineeringMachine LearningSpoken Language ProcessingRecurrent Neural NetworkSpeech RecognitionNatural Language ProcessingHidden Markov ModelPhoneticsLanguage StudiesMachine TranslationSpeech SynthesisLinguisticsSpeech OutputComputer ScienceLombard Speech SynthesisSpeech TechnologyLombard Speech AdaptationSpeech CommunicationSpeech ProcessingSpeech InputLombard Effect
In statistical parametric speech synthesis (SPSS), a few studies have investigated the Lombard effect, specifically by using hidden Markov model (HMM)-based systems. Recently, artificial neural networks have demonstrated promising results in SPSS, specifically by using long short-term memory recurrent neural networks (LSTMs). The Lombard effect, however, has not been studied in the LSTM-based speech synthesis systems. In this study, we propose three methods for Lombard speech adaptation in LSTM-based speech synthesis. In particular, (1) we augment Lombard specific information with the linguistic features as input, (2) scale the hidden activations using the learning hidden unit contributions (LHUC) method, and (3) fine-tune the LSTMs trained on normal speech with a small Lombard speech data. To investigate the effectiveness of the proposed methods, we carry out experiments using small (10 utterances) and large (500 utterances) Lombard speech data. Experimental results confirm the adaptability of the LSTMs, and similarity tests show that the LSTMs can achieve significantly better adaptation performance than the HMMs in both small and large data conditions.
| Year | Citations | |
|---|---|---|
Page 1
Page 1