Publication | Closed Access
Low-Latency Single Channel Speech Enhancement Using U-Net Convolutional Neural Networks
29
Citations
34
References
2020
Year
Unknown Venue
EngineeringMachine LearningSpeech EnhancementSpeech RecognitionSingle-channel Noisy SpeechNoiseRobust Speech RecognitionClean SpeechHealth SciencesSingle-channel Speech EnhancementComputer EngineeringComputer ScienceDeep LearningDistant Speech RecognitionSignal ProcessingSpeech CommunicationMulti-speaker Speech RecognitionSpeech ProcessingSpeech SeparationSpeech Perception
Single-channel speech enhancement (SE) can be described, in its simplest terms, as learning a transformation from single-channel noisy speech to the clean speech. To do this, we propose a simple but effective U-Net convolutional neural network (CNN) based architecture with skip-connections with a focus on real-time applications which require low-latency processing. To that end, we choose to process relatively small temporal windows and apply time-frequency (T-F) featurization on it to achieve magnitude estimation. Two state-of-the-art systems are picked for bench-marking: One operating on spectral-domain [1] and the other on temporal-domain [2]. We evaluate the performance of the systems in terms of perceptual evaluation of speech quality (PESQ), short-time objective intelligibility (STOI). Experimental results show that in terms of PESQ measure the proposed method provides around 27% and 11% relative improvement over the baseline systems respectively and has significantly lower latency compared to them. We further investigate the trade-off between performance and overall latency of the proposed system.
| Year | Citations | |
|---|---|---|
Page 1
Page 1