Publication | Open Access
Evaluation of the Vulnerability of Speaker Verification to Synthetic Speech
53
Citations
33
References
2010
Year
In this paper, we evaluate the vulnerability of a speaker verification \n(SV) system to synthetic speech. Although this problem \nwas first examined over a decade ago, dramatic improvements \nin both SV and speech synthesis have renewed interest in \nthis problem. We use a HMM-based speech synthesizer, which \ncreates synthetic speech for a targeted speaker through adaptation \nof a background model and a GMM-UBM-based SV system. \nUsing 283 speakers from the Wall-Street Journal (WSJ) \ncorpus, our SV system has a 0.4% EER. When the system \nis tested with synthetic speech generated from speaker models \nderived from the WSJ journal corpus, 90% of the matched \nclaims are accepted. This result suggests a possible vulnerability \nin SV systems to synthetic speech. In order to detect \nsynthetic speech prior to recognition, we investigate the \nuse of an automatic speech recognizer (ASR), dynamic-timewarping \n(DTW) distance of mel-frequency cepstral coefficients \n(MFCC), and previously-proposed average inter-frame difference \nof log-likelihood (IFDLL). Overall, while SV systems \nhave impressive accuracy, even with the proposed detector, \nhigh-quality synthetic speech can lead to an unacceptably high \nacceptance rate of synthetic speakers.
| Year | Citations | |
|---|---|---|
Page 1
Page 1