Publication | Closed Access
Investigating Self-Supervised Pre-Training for End-to-End Speech Translation
37
Citations
19
References
2020
Year
Unknown Venue
Llm Fine-tuningEngineeringMachine LearningAutomatic Speech TranslationSpoken Language ProcessingMultilingual PretrainingEnd-to-end Speech TranslationSpeech RecognitionNatural Language ProcessingComputational LinguisticsSelf-supervised LearningLanguage StudiesReal-time LanguageMachine TranslationLinguisticsComputer ScienceDeep LearningSpeech CommunicationNeural Machine TranslationSpeech ProcessingRaw SpeechSpeech Translation
Self-supervised learning from raw speech has been proven beneficial to improve automatic speech recognition (ASR). We investigate here its impact on end-to-end automatic speech translation (AST) performance. We use a contrastive predic-tive coding (CPC) model pre-trained from unlabeled speech as a feature extractor for a downstream AST task. We show that self-supervised pre-training is particularly efficient in low resource settings and that fine-tuning CPC models on the AST training data further improves performance. Even in higher resource settings, ensembling AST models trained with filter-bank and CPC representations leads to near state-of-the-art models without using any ASR pre-training. This might be particularly beneficial when one needs to develop a system that translates from speech in a language with poorly standardized orthography or even from speech in an unwritten language. Index Terms: self-supervised learning from speech, automatic speech translation, end-to-end models, low resource settings.
| Year | Citations | |
|---|---|---|
Page 1
Page 1