Concepedia

Publication | Closed Access

Deep Neural Networks for extracting Baum-Welch statistics for Speaker Recognition

215

Citations

15

References

2014

Year

TLDR

The study investigates using deep neural networks to extract Baum‑Welch statistics for i‑vector‑based, text‑independent speaker recognition. The authors replace the EM‑trained universal background model with a DNN‑modeled triphone‑state posterior, combining these assignments with 60‑dim MFCCs to compute first‑order Baum‑Welch statistics for training the i‑vector extractor. Although the DNN‑derived i‑vectors perform worse alone, they provide complementary speaker information, yielding a 16 % relative gain when fused with standard i‑vectors, and a different DNN configuration achieved baseline‑level performance on NIST 2012 C2 (female).

Abstract

We examine the use of Deep Neural Networks (DNN) in extracting Baum-Welch statistics for i-vector-based textindependent speaker recognition. Instead of training the universal background model using the standard EM algorithm, the components are predefined and correspond to the set of triphone states, the posterior occupancy probabilities of which are modeled by a DNN. Those assignments are then combined with the standard 60-dim MFCC features to calculate first order BaumWelch statistics in order to train the i-vector extractor and extract i-vectors. The DNN-based assignment force the i-vectors to capture the idiosyncratic way in which each speaker pronounces each particular triphone state, which can enrich the standard short-term spectral representation of the standard ivectors. After experimenting with Switchboard data and a baseline PLDA classifier, our results showed that although the proposed i-vectors yield inferior performance compared to the standard ones, they are capable of attaining 16% relative improvement when fused with them, meaning that they carry useful complementary information about the speaker’s identity. A further experiment with a different DNN configuration attained comparable performance with the baseline i-vectors on NIST 2012 (condition C2, female).

References

YearCitations

Page 1