Publication | Closed Access
Investigation on Neural Bandwidth Extension of Telephone Speech for Improved Speaker Recognition
13
Citations
11
References
2019
Year
Unknown Venue
EngineeringMachine LearningBandwidth ExtensionUpsampled Telephone SpeechTelephone SpeechSpeech RecognitionNeural Bandwidth ExtensionRobust Speech RecognitionVoice RecognitionHealth SciencesImproved Speaker RecognitionDeep LearningDistant Speech RecognitionSignal ProcessingSpeech CommunicationSpeech TechnologyMulti-speaker Speech RecognitionSpeech ProcessingSpeech InputSpeech PerceptionNb Telephone SpeechSpeaker Recognition
We extend our previous work on training mixed-bandwidth (BW) speaker recognition system by predicting missing information in upperband (UB) of upsampled telephone speech. Mixed-BW systems combine speech from narrowband (NB) and wideband (WB) speech corpora by basic upsampling of NB speech with low-pass filter interpolator, resulting in no information loss in the original WB speech. In this work, we explore the usage of a deep residual full-convolutional neural network (CNN) and a bidirectional long short term memory (BLSTM) network along with a previously proposed deep neural network (DNN) for bandwidth extension (BWE) of NB telephone speech. Speaker recognition systems trained with bandwidth extended features improved in performance over mixed-BW and NB baseline systems. In terms of detection cost function (DCF), the CNN-BWE system improved by 10.78% and 15.96% (relative) in the Speakers In The Wild (SITW) eval core and assist-multi-speaker condition respectively w.r.t. the NB baseline; and improved by 3.21% and 4.13% w.r.t. to the mixed-BW baseline.
| Year | Citations | |
|---|---|---|
Page 1
Page 1