Publication | Closed Access
A two-stage approach for improving the perceptual quality of separated speech
21
Citations
12
References
2014
Year
Unknown Venue
Source SeparationBinary MaskingSpeech EnhancementPsycholinguisticsSpeech RecognitionSecond StagePhoneticsBinary Time-frequency MaskingRobust Speech RecognitionLanguage StudiesHealth SciencesSignal ProcessingSpeech CommunicationSpeech TechnologySeparated SpeechSpeech AnalysisTwo-stage ApproachMulti-speaker Speech RecognitionSpeech SeparationSpeech ProcessingPerceptual QualitySpeech PerceptionLinguistics
Binary time-frequency masking and model-based nonnegative matrix factorization (NMF) are two common approaches to speech separation. However, binary masking often suffers from poor perceptual quality, while NMF typically requires pretrained models for both speech and noise and frequently does not perform well. In this paper we examine whether a single or two-stage approach should be used for performing separation. We propose a two-stage algorithm that uses a soft mask in the first stage for separation, and NMF in the second stage for improving perceptual quality where only a speech model needs to be trained. We show that the proposed two-stage approach achieves higher objective perceptual quality and intelligibility compared to related single-stage methods.
| Year | Citations | |
|---|---|---|
Page 1
Page 1