Publication | Closed Access
Keyword Based Speaker Localization: Localizing a Target Speaker in a Multi-speaker Environment
22
Citations
0
References
2018
Year
Unknown Venue
EngineeringBest Localization PerformanceLocalization SystemLocalizationSpeech RecognitionSpeaker LocalizationSpeaker DiarizationRobust Speech RecognitionVoice RecognitionMulti-speaker EnvironmentHealth SciencesComputer ScienceDistant Speech RecognitionSignal ProcessingSpeech CommunicationTarget SpeakerMulti-speaker Speech RecognitionSpeech ProcessingSpeech PerceptionLinguisticsSpeaker Recognition
Speaker localization is a hard task, especially in adverse environmental conditions involving reverberation and noise. In this work we introduce the new task of localizing the speaker who uttered a given keyword, e.g., the wake-up word of a distant-microphone voice command system, in the presence of overlapping speech. We employ a convolutional neural network based localization system and investigate multiple identifiers as additional inputs to the system in order to characterize this speaker. We conduct experiments using ground truth identifiers which are obtained assuming the availability of clean speech and also in realistic conditions where the identifiers are computed from the corrupted speech. We find that the identifier consisting of the ground truth time-frequency mask corresponding to the target speaker provides the best localization performance and we propose methods to estimate such a mask in adverse reverberant and noisy conditions using the considered keyword.