Concepedia

Abstract

A very large data base consisting of over 36 h of unconstrained extemporaneous speech, from 17 speakers, recorded over a period of more than three months, has been analyzed to determine the effectiveness of long-term average features for speaker recognition. Results are shown to be strongly dependent on the voiced speech averaging interval L <inf xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">ε</inf> . Monotonic increases in the probability of correct identification and monotonic decreases in the equal error probability for speaker verification were obtained as L <inf xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">ε</inf> increased, even with substantial time periods between successive sessions. For L <inf xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">ε</inf> corresponding to approximately 39 s of speech, text-independent results (no linguistic constraints embedded into the data base) of 98.05 percent for speaker identification and 4.25 percent for equal error speaker verification were obtained.

References

YearCitations

Page 1