Publication | Closed Access
Direct Speech-to-Speech Translation with a Sequence-to-Sequence Model
164
Citations
30
References
2019
Year
Unknown Venue
EngineeringMachine LearningBaseline CascadeMultilingual PretrainingCorpus LinguisticsSpeech RecognitionNatural Language ProcessingComputational LinguisticsLanguage StudiesMachine TranslationTranslated SpeechSpeech SynthesisLinguisticsSpeech OutputDirect Speech-to-speech TranslationText-to-speech Synthesis ModelDeep LearningText-to-speechNeural Machine TranslationSpeech ProcessingSpeech Translation
We present an attention-based sequence-to-sequence neural network which can directly translate speech from one language into speech in another language, without relying on an intermediate text representation.The network is trained end-to-end, learning to map speech spectrograms into target spectrograms in another language, corresponding to the translated content (in a different canonical voice).We further demonstrate the ability to synthesize translated speech using the voice of the source speaker.We conduct experiments on two Spanish-to-English speech translation datasets, and find that the proposed model slightly underperforms a baseline cascade of a direct speech-to-text translation model and a text-to-speech synthesis model, demonstrating the feasibility of the approach on this very challenging task.
| Year | Citations | |
|---|---|---|
Page 1
Page 1