Publication | Closed Access
Recent Developments on Espnet Toolkit Boosted By Conformer
194
Citations
32
References
2021
Year
Unknown Venue
EngineeringMachine LearningReal-time DataCorpus LinguisticsSpeech RecognitionNatural Language ProcessingData ScienceComputational LinguisticsData IntegrationLanguage StudiesConvolution-augmented TransformerReal-time LanguageMachine TranslationBenchmark DatasetsKnowledge DiscoverySpeech OutputComputer ScienceDeep LearningText-to-speechSpeech CommunicationComputational ScienceEspnet Toolkit BoostedOpen SourceMulti-speaker Speech RecognitionLive-streamingSpeech SeparationSpeech ProcessingSpeech InputSpeech Translation
This study presents recent ESPnet toolkit developments featuring the Conformer architecture and aims to ease research by releasing all‑in‑one recipes and reducing resource burdens. The authors develop ESPnet with the Conformer architecture and plan to release all‑in‑one recipes built on open‑source corpora and pre‑trained models. Experiments show that Conformer‑based ESPnet achieves competitive or superior performance across ASR, ST, SS, and TTS, with notable training tips and benefits.
In this study, we present recent developments on ESPnet: End-to- End Speech Processing toolkit, which mainly involves a recently proposed architecture called Conformer, Convolution-augmented Transformer. This paper shows the results for a wide range of end- to-end speech processing applications, such as automatic speech recognition (ASR), speech translations (ST), speech separation (SS) and text-to-speech (TTS). Our experiments reveal various training tips and significant performance benefits obtained with the Conformer on different tasks. These results are competitive or even outperform the current state-of-art Transformer models. We are preparing to release all-in-one recipes using open source and publicly available corpora for all the above tasks with pre-trained models. Our aim for this work is to contribute to our research community by reducing the burden of preparing state-of-the-art research environments usually requiring high resources.
| Year | Citations | |
|---|---|---|
Page 1
Page 1