Publication | Closed Access
An Algorithm for Predicting the Intelligibility of Speech Masked by Modulated Noise Maskers
564
Citations
37
References
2016
Year
EngineeringSpeech IntelligibilitySpeech EnhancementAcoustic ModelingSpeech RecognitionSpeech CodingModulated Noise MaskersIntelligibility SubspacesNoiseSpeech MaskedRobust Speech RecognitionOrthogonal DecompositionStatisticsHealth SciencesComputer EngineeringComputer ScienceIntelligibility Listening TestsDistant Speech RecognitionSignal ProcessingSpeech CommunicationSpeech TechnologySpeech SeparationSpeech ProcessingSpeech Perception
Intelligibility listening tests are essential for evaluating speech processing algorithms, yet they are costly and time‑consuming, and the proposed algorithm extends the short‑time objective intelligibility (STOI) method to a broader range of input signals. The authors propose a monaural intelligibility prediction algorithm that could replace some of these expensive listening tests. Extended STOI (ESTOI) removes the assumption of frequency‑band independence, compares 400‑ms spectrograms of noisy and clean speech, interprets results via an orthogonal decomposition into intelligibility subspaces, and is available as free MATLAB code. ESTOI accurately predicts intelligibility for speech masked by temporally highly modulated noise and for signals processed with time‑frequency weighting.
Intelligibility listening tests are necessary during development and evaluation of speech processing algorithms, despite the fact that they are expensive and time consuming. In this paper, we propose a monaural intelligibility prediction algorithm, which has the potential of replacing some of these listening tests. The proposed algorithm shows similarities to the short-time objective intelligibility (STOI) algorithm, but works for a larger range of input signals. In contrast to STOI, extended STOI (ESTOI) does not assume mutual independence between frequency bands. ESTOI also incorporates spectral correlation by comparing complete 400ms length spectrograms of the noisy/processed speech and the clean speech signals. As a consequence, ESTOI is also able to accurately predict the intelligibility of speech contaminated by temporally highly modulated noise sources in addition to noisy signals processed with time-frequency weighting. We show that ESTOI can be interpreted in terms of an orthogonal decomposition of short-time spectrograms into intelligibility subspaces, i.e., a ranking of spectrogram features according to their importance to intelligibility. A free MATLAB implementation of the algorithm is available for noncommercial use at http://kom.aau.dk/~jje/.
| Year | Citations | |
|---|---|---|
Page 1
Page 1