Publication | Closed Access
Segmentation of speech using speaker identification
96
Citations
7
References
2002
Year
Unknown Venue
EngineeringMachine LearningSpeaker SegmentationSpeech RecognitionNatural Language ProcessingSpeaker IdentificationSpeaker DiarizationLikelihood RatioVoice RecognitionHealth SciencesSpeaker IdentityComputer ScienceSignal ProcessingSpeech CommunicationMulti-speaker Speech RecognitionSpeech ProcessingSpeech PerceptionLinguisticsSpeaker Recognition
This paper describes techniques for segmentation of conversational speech based on speaker identity. Speaker segmentation is performed using Viterbi decoding on a hidden Markov model network consisting of interconnected speaker sub-networks. Speaker sub-networks are initialized using Baum-Welch training on data labeled by speaker, and are iteratively retrained based on the previous segmentation. If data labeled by speaker is not available, agglomerative clustering is used to approximately segment the conversational speech according to speaker prior to Baum-Welch training. The distance measure for the clustering is a likelihood ratio in which speakers are modeled by Gaussian distributions. The distance between merged segments is recomputed at each stage of the clustering, and a duration model is used to bias the likelihood ratio. Segmentation accuracy using agglomerative clustering initialization matches accuracy using initialization with speaker labeled data.< <ETX xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">></ETX>
| Year | Citations | |
|---|---|---|
Page 1
Page 1