Publication | Closed Access
End-to-End Speaker Diarization for an Unknown Number of Speakers with Encoder-Decoder Based Attractors
158
Citations
32
References
2020
Year
Unknown Venue
EngineeringHealth SciencesMulti-speaker Speech RecognitionAttractor CalculationSpeaker DiarizationRobust Speech RecognitionDistant Speech RecognitionSpeech ProcessingUnknown NumberSpeech PerceptionEnd-to-end Speaker DiarizationSignal ProcessingAcoustic ModelingSpeech CommunicationSpeaker RecognitionSpeech Recognition
End-to-end speaker diarization for an unknown number of speakers is addressed in this paper.Recently proposed end-toend speaker diarization outperformed conventional clusteringbased speaker diarization, but it has one drawback: it is less flexible in terms of the number of speakers.This paper proposes a method for encoder-decoder based attractor calculation (EDA), which first generates a flexible number of attractors from a speech embedding sequence.Then, the generated multiple attractors are multiplied by the speech embedding sequence to produce the same number of speaker activities.The speech embedding sequence is extracted using the conventional self-attentive end-to-end neural speaker diarization (SA-EEND) network.In a two-speaker condition, our method achieved a 2.69 % diarization error rate (DER) on simulated mixtures and a 8.07 % DER on the two-speaker subset of CALLHOME, while vanilla SA-EEND attained 4.56 % and 9.54 %, respectively.In unknown numbers of speakers conditions, our method attained a 15.29 % DER on CALLHOME, while the x-vectorbased clustering method achieved a 19.43 % DER.
| Year | Citations | |
|---|---|---|
Page 1
Page 1