Investigation on Neural Bandwidth Extension of Telephone Speech for Improved Speaker Recognition

Abstract

We extend our previous work on training mixed-bandwidth (BW) speaker recognition system by predicting missing information in upperband (UB) of upsampled telephone speech. Mixed-BW systems combine speech from narrowband (NB) and wideband (WB) speech corpora by basic upsampling of NB speech with low-pass filter interpolator, resulting in no information loss in the original WB speech. In this work, we explore the usage of a deep residual full-convolutional neural network (CNN) and a bidirectional long short term memory (BLSTM) network along with a previously proposed deep neural network (DNN) for bandwidth extension (BWE) of NB telephone speech. Speaker recognition systems trained with bandwidth extended features improved in performance over mixed-BW and NB baseline systems. In terms of detection cost function (DCF), the CNN-BWE system improved by 10.78% and 15.96% (relative) in the Speakers In The Wild (SITW) eval core and assist-multi-speaker condition respectively w.r.t. the NB baseline; and improved by 3.21% and 4.13% w.r.t. to the mixed-BW baseline.

References

Page 1

	Year	Citations

Page 1