Publication | Closed Access
The NTT CHiME-3 system: Advances in speech enhancement and recognition for mobile multi-microphone devices
236
Citations
34
References
2015
Year
Unknown Venue
Research CommunityNtt Chime-3 SystemEngineeringMachine LearningSpeech EnhancementSpeech RecognitionData SciencePattern RecognitionRobust Speech RecognitionVoice RecognitionHealth SciencesSpectral MasksChime-3 SystemComputer ScienceDeep LearningDistant Speech RecognitionSignal ProcessingSpeech CommunicationSpeech TechnologyChime ChallengeSpeech ProcessingSpeech InputSpeech PerceptionMobile Multi-microphone
CHiME-3 is a research community challenge organised in 2015 to evaluate speech recognition systems for mobile multi-microphone devices used in noisy daily environments. This paper describes NTT's CHiME-3 system, which integrates advanced speech enhancement and recognition techniques. Newly developed techniques include the use of spectral masks for acoustic beam-steering vector estimation and acoustic modelling with deep convolutional neural networks based on the "network in network" concept. In addition to these improvements, our system has several key differences from the official baseline system. The differences include multi-microphone training, dereverberation, and cross adaptation of neural networks with different architectures. The impacts that these techniques have on recognition performance are investigated. By combining these advanced techniques, our system achieves a 3.45% development error rate and a 5.83% evaluation error rate. Three simpler systems are also developed to perform evaluations with constrained set-ups.
| Year | Citations | |
|---|---|---|
Page 1
Page 1