An algorithm that improves speech intelligibility in noise for normal-hearing listeners

TLDR

Traditional noise‑suppression algorithms improve speech quality but not intelligibility. The study proposes an algorithm that decomposes speech into time‑frequency units and uses a Bayesian classifier to decide whether each unit is dominated by target or masker, motivated by prior ideal binary mask intelligibility studies. The algorithm synthesizes speech corrupted at low SNRs (–5 and 0 dB) with various maskers and presents the resulting signals to normal‑hearing listeners for identification. The algorithm increased intelligibility by over 60 percentage points at –5 dB babble compared to unprocessed speech, demonstrating that accurate SNR estimation per time‑frequency unit improves intelligibility.

Abstract

Traditional noise-suppression algorithms have been shown to improve speech quality, but not speech intelligibility. Motivated by prior intelligibility studies of speech synthesized using the ideal binary mask, an algorithm is proposed that decomposes the input signal into time-frequency (T-F) units and makes binary decisions, based on a Bayesian classifier, as to whether each T-F unit is dominated by the target or the masker. Speech corrupted at low signal-to-noise ratio (SNR) levels (-5 and 0 dB) using different types of maskers is synthesized by this algorithm and presented to normal-hearing listeners for identification. Results indicated substantial improvements in intelligibility (over 60% points in -5 dB babble) over that attained by human listeners with unprocessed stimuli. The findings from this study suggest that algorithms that can estimate reliably the SNR in each T-F unit can improve speech intelligibility.

References

Page 1

	Year	Citations

Page 1