Concepedia

Publication | Open Access

Evaluation of the Vulnerability of Speaker Verification to Synthetic Speech

53

Citations

33

References

2010

Year

Abstract

In this paper, we evaluate the vulnerability of a speaker verification
\n(SV) system to synthetic speech. Although this problem
\nwas first examined over a decade ago, dramatic improvements
\nin both SV and speech synthesis have renewed interest in
\nthis problem. We use a HMM-based speech synthesizer, which
\ncreates synthetic speech for a targeted speaker through adaptation
\nof a background model and a GMM-UBM-based SV system.
\nUsing 283 speakers from the Wall-Street Journal (WSJ)
\ncorpus, our SV system has a 0.4% EER. When the system
\nis tested with synthetic speech generated from speaker models
\nderived from the WSJ journal corpus, 90% of the matched
\nclaims are accepted. This result suggests a possible vulnerability
\nin SV systems to synthetic speech. In order to detect
\nsynthetic speech prior to recognition, we investigate the
\nuse of an automatic speech recognizer (ASR), dynamic-timewarping
\n(DTW) distance of mel-frequency cepstral coefficients
\n(MFCC), and previously-proposed average inter-frame difference
\nof log-likelihood (IFDLL). Overall, while SV systems
\nhave impressive accuracy, even with the proposed detector,
\nhigh-quality synthetic speech can lead to an unacceptably high
\nacceptance rate of synthetic speakers.

References

YearCitations

Page 1