Publication | Closed Access
Parrotron: An End-to-End Speech-to-Speech Conversion Model and its Applications to Hearing-Impaired Speech and Speech Separation
110
Citations
26
References
2019
Year
Unknown Venue
Speech Separation TaskEngineeringNormalization ModelPhonologyFixed AccentSpeech RecognitionRobust Speech RecognitionHealth SciencesHearing-impaired SpeechSpeech SynthesisSpeech OutputComputer ScienceDeep LearningText-to-speechSpeech CommunicationSpeech TechnologyVoiceMulti-speaker Speech RecognitionSpeech ProcessingSpeech SeparationSpeech InputSpeech Perception
We describe Parrotron, an end-to-end-trained speech-to-speech conversion model that maps an input spectrogram directly to another spectrogram, without utilizing any intermediate discrete representation.The network is composed of an encoder, spectrogram and phoneme decoders, followed by a vocoder to synthesize a time-domain waveform.We demonstrate that this model can be trained to normalize speech from any speaker regardless of accent, prosody, and background noise, into the voice of a single canonical target speaker with a fixed accent and consistent articulation and prosody.We further show that this normalization model can be adapted to normalize highly atypical speech from a deaf speaker, resulting in significant improvements in intelligibility and naturalness, measured via a speech recognizer and listening tests.Finally, demonstrating the utility of this model on other speech tasks, we show that the same model architecture can be trained to perform a speech separation task.
| Year | Citations | |
|---|---|---|
Page 1
Page 1