Publication | Closed Access
Learning Spectral Mapping for Speech Dereverberation and Denoising
276
Citations
42
References
2015
Year
EngineeringMachine LearningSpeech IntelligibilitySpeech EnhancementSpeech RecognitionData ScienceNoiseRobust Speech RecognitionClean SpeechHealth SciencesComputer ScienceSpectral MappingDeep LearningDistant Speech RecognitionSignal ProcessingSpeech CommunicationMulti-speaker Speech RecognitionSpeech ProcessingSpeech SeparationSpeech DereverberationSpeech Perception
In real‑world settings, reverberation and background noise degrade speech intelligibility, quality, and the performance of speech technologies, necessitating effective dereverberation and denoising. The study proposes a supervised learning approach to jointly dereverberate and denoise speech. Deep neural networks are trained to map corrupted speech spectrograms directly to clean spectrograms. Experiments show that the method markedly reduces reverberation and noise, improves intelligibility, quality, and ASR performance, and outperforms related techniques.
In real-world environments, human speech is usually distorted by both reverberation and background noise, which have negative effects on speech intelligibility and speech quality. They also cause performance degradation in many speech technology applications, such as automatic speech recognition. Therefore, the dereverberation and denoising problems must be dealt with in daily listening environments. In this paper, we propose to perform speech dereverberation using supervised learning, and the supervised approach is then extended to address both dereverberation and denoising. Deep neural networks are trained to directly learn a spectral mapping from the magnitude spectrogram of corrupted speech to that of clean speech. The proposed approach substantially attenuates the distortion caused by reverberation, as well as background noise, and is conceptually simple. Systematic experiments show that the proposed approach leads to significant improvements of predicted speech intelligibility and quality, as well as automatic speech recognition in reverberant noisy conditions. Comparisons show that our approach substantially outperforms related methods.
| Year | Citations | |
|---|---|---|
Page 1
Page 1