Publication | Closed Access
ASR N-Best Fusion Nets
58
Citations
21
References
2021
Year
Unknown Venue
EngineeringMachine LearningSpoken Language ProcessingMultilingual PretrainingSpeech RecognitionNatural Language ProcessingData ScienceFusion NetworkComputational LinguisticsFusion LearningRobust Speech RecognitionLanguage StudiesSpoken Language UnderstandingMachine TranslationData FusionComputer EngineeringComputer ScienceDeep LearningFeature FusionAutomatic Speech RecognitionMulti-speaker Speech RecognitionAsr ErrorsSpeech ProcessingSpeech InputLinguistics
Current spoken language understanding systems heavily rely on the best hypothesis (ASR 1-best) generated by automatic speech recognition, which is used as the input for downstream models such as natural language understanding (NLU) modules. However, the potential errors and misrecognition in ASR 1-best raise challenges to NLU. It is usually difficult for NLU models to recover from ASR errors without additional signals, which leads to suboptimal SLU performance. This paper proposes a fusion network to jointly consider ASR n-best hypotheses for enhanced robustness to ASR errors. Our experiments on Alexa data show that our model achieved 21.71% error reduction compared to baseline trained on transcription for domain classification.
| Year | Citations | |
|---|---|---|
Page 1
Page 1