Publication | Closed Access
Exploring ASR-free end-to-end modeling to improve spoken language understanding in a cloud-based dialog system
63
Citations
40
References
2017
Year
EngineeringMachine LearningSpoken Language ProcessingSpoken Dialog SystemCorpus LinguisticsSpeech RecognitionNatural Language ProcessingData ScienceComputational LinguisticsSpeech InterfaceConversation AnalysisVoice RecognitionLanguage StudiesReal-time LanguageSpoken Language UnderstandingMachine TranslationDialogue ManagementCloud-based Dialog SystemDialog SystemsAsr PerformanceComputer ScienceDeep LearningAsr-free End-to-end ModelingAutomatic Speech RecognitionSpeech ProcessingSpeech InputLinguistics
Spoken language understanding (SLU) in dialog systems is generally performed using a natural language understanding (NLU) model based on the hypotheses produced by an automatic speech recognition (ASR) system. However, when new spoken dialog applications are built from scratch in real user environments that often have sub-optimal audio characteristics, ASR performance can suffer due to factors such as the paucity of training data or a mismatch between the training and test data. To address this issue, this paper proposes an ASR-free, end-to-end (E2E) modeling approach to SLU for a cloud-based, modular spoken dialog system (SDS). We evaluate the effectiveness of our approach on crowdsourced data collected from non-native English speakers interacting with a conversational language learning application. Experimental results show that our approach is particularly promising in situations with low ASR accuracy. It can further improve the performance of a sophisticated CNN-based SLU system with more accurate ASR hypotheses by fusing the scores from E2E system, i.e., the overall accuracy of SLU is improved from 85.6% to 86.5%.
| Year | Citations | |
|---|---|---|
Page 1
Page 1