Publication | Closed Access
Single channel speech enhancement based on masking properties of the human auditory system
588
Citations
19
References
1999
Year
EngineeringSpeech EnhancementHuman Auditory SystemNoise ReductionSpeech RecognitionSpeech CodingAudio Signal ProcessingNoiseAudio AnalysisRobust Speech RecognitionHealth SciencesAuditory ProcessingSubjective EvaluationMulti-channel ProcessingSignal ProcessingSpeech CommunicationResidual NoiseSpeech ProcessingSpeech SeparationSpeech PerceptionAuditory System
Single‑channel subtractive‑type speech enhancement algorithms trade noise reduction against speech distortion and musical residual noise, and their fixed, optimized parameters are hard to choose across varying speech and noise conditions. This paper addresses single‑channel speech enhancement at very low SNRs (<10 dB). The authors introduce a computationally efficient, auditory‑model‑based subtractive enhancement that automatically adapts time‑frequency parameters to optimize a perception‑correlated criterion. The method significantly reduces residual noise, yields more pleasant speech in listening tests, and improves speech‑recognition performance over classical algorithms.
This paper addresses the problem of single channel speech enhancement at very low signal-to-noise ratios (SNRs) (<10 dB). The proposed approach is based on the introduction of an auditory model in a subtractive-type enhancement process. Single channel subtractive-type algorithms are characterized by a tradeoff between the amount of noise reduction, the speech distortion, and the level of musical residual noise, which can be modified by varying the subtraction parameters. Classical algorithms are usually limited to the use of fixed optimized parameters, which are difficult to choose for all speech and noise conditions. A new computationally efficient algorithm is developed based on masking properties of the human auditory system. It allows for an automatic adaptation in time and frequency of the parametric enhancement system, and finds the best tradeoff based on a criterion correlated with perception. This leads to a significant reduction of the unnatural structure of the residual noise. Objective and subjective evaluation of the proposed system is performed with several noise types form the Noisex-92 database, having different time-frequency distributions. The application of objective measures, the study of the speech spectrograms, as well as subjective listening tests, confirm that the enhanced speech is more pleasant to a human listener. Finally, the proposed enhancement algorithm is tested as a front-end processor for speech recognition in noise, resulting in improved results over classical subtractive-type algorithms.
| Year | Citations | |
|---|---|---|
Page 1
Page 1