Publication | Closed Access
Single-channel Speech Extraction Using Speaker Inventory and Attention Network
75
Citations
26
References
2019
Year
Unknown Venue
Source SeparationEngineeringMachine LearningSpeech RecognitionNatural Language ProcessingAttention NetworkSpeaker DiarizationLibri CorpusHealth SciencesDistant Speech RecognitionSignal ProcessingSpeech CommunicationVoiceMulti-speaker Speech RecognitionSeparation ProcessSpeech ProcessingSpeech SeparationSpeech PerceptionSpeaker Recognition
Neural network-based speech separation has received a surge of interest in recent years. Previously proposed methods either are speaker independent or extract a target speaker's voice by using his or her voice snippet. In applications such as home devices or office meeting transcriptions, a possible speaker list is available, which can be leveraged for speech separation. This paper proposes a novel speech extraction method that utilizes an inventory of voice snippets of possible interfering speakers, or speaker enrollment data, in addition to that of the target speaker. Furthermore, an attention-based network architecture is proposed to form time-varying masks for both the target and other speakers during the separation process. This architecture does not reduce the enrollment audio of each speaker into a single vector, thereby allowing each short time frame of the input mixture signal to be aligned and accurately compared with the enrollment signals. We evaluate the proposed system on a speaker extraction task derived from the Libri corpus and show the effectiveness of the method.
| Year | Citations | |
|---|---|---|
Page 1
Page 1