Publication | Closed Access
Deep Learning Based Phase Reconstruction for Speaker Separation: A Trigonometric Perspective
100
Citations
37
References
2019
Year
Unknown Venue
Source SeparationSpeech PerceptionEngineeringHealth SciencesSpeaker SeparationPhoneticsMulti-speaker Speech RecognitionIterative Phase ReconstructionSpeaker DiarizationDistant Speech RecognitionSpeech SeparationSpeech ProcessingSignal SeparationDeep LearningPhase ReconstructionSignal ProcessingSpeech CommunicationSpeech Recognition
This study investigates phase reconstruction for deep learning based monaural talker-independent speaker separation in the short-time Fourier transform (STFT) domain. The key observation is that, for a mixture oftwo sources, with their magnitudes accurately estimated and under a geometric constraint, the absolute phase difference between each source and the mixture can be uniquely determined; in addition, the source phases at each time-frequency T - F unit can be narrowed down to only two candidates. To pick the right candidate, we propose three algorithms based on iterative phase reconstruction, group delay estimation, and phase-difference sign prediction. State-of-the-art results are obtained on the publicly available wsj0-2mix and 3 mix corpus.
| Year | Citations | |
|---|---|---|
Page 1
Page 1