Publication | Closed Access
Permutation invariant training of deep models for speaker-independent multi-talker speech separation
856
Citations
20
References
2017
Year
Unknown Venue
Natural Language ProcessingSource SeparationDeep ModelsSpeech PerceptionEngineeringMachine LearningHealth SciencesMulti-speaker Speech RecognitionSpeaker DiarizationDistant Speech RecognitionSpeech ProcessingSpeech SeparationDeep LearningPermutation Invariant TrainingSpeech CommunicationSpeech Recognition
The authors introduce permutation invariant training (PIT) to address speaker‑independent multi‑talker speech separation, the cocktail‑party problem. PIT directly minimizes separation error, unlike multi‑class regression or deep clustering, providing a novel training criterion. PIT effectively resolves the label‑permutation issue, outperforming NMF, CASA, and DPCL on WSJ0 and Danish tasks, generalizes to unseen speakers and languages, and is simple to implement and extend.
We propose a novel deep learning training criterion, named permutation invariant training (PIT), for speaker independent multi-talker speech separation, commonly known as the cocktail-party problem. Different from the multi-class regression technique and the deep clustering (DPCL) technique, our novel approach minimizes the separation error directly. This strategy effectively solves the long-lasting label permutation problem, that has prevented progress on deep learning based techniques for speech separation. We evaluated PIT on the WSJ0 and Danish mixed-speech separation tasks and found that it compares favorably to non-negative matrix factorization (NMF), computational auditory scene analysis (CASA), and DPCL and generalizes well over unseen speakers and languages. Since PIT is simple to implement and can be easily integrated and combined with other advanced techniques, we believe improvements built upon PIT can eventually solve the cocktail-party problem.
| Year | Citations | |
|---|---|---|
2012 | 10.2K | |
1953 | 4.5K | |
2011 | 3.1K | |
2006 | 2.9K | |
2016 | 1.4K | |
1997 | 1.2K | |
2014 | 1.1K | |
2013 | 944 | |
2011 | 876 | |
1992 | 684 |
Page 1
Page 1