Speech emotion recognition using combination of features

Abstract

In this paper, we study how speech features' numbers and statistical values impact recognition accuracy of emotions present in speech. With Gaussian Mixture Model (GMM), we identify two effective features, namely Mel Frequency Cepstrum Coefficients (MFCCs) and Auto Correlation Function Coefficients (ACFC) extracted directly from speech signal. Using GMM supervector formed by values of MFCCs, delta MFCCs and ACFC, we conduct experiments with Berlin emotional database considering six previously proposed emotions: anger, disgust, fear, happy, neutral and sad. Our method achieve emotion recognition rate of 74.45%, significantly better than 59.00% achieved previously. To prove the broad applicability of our method, we also conduct experiments considering a different set of emotions: anger, boredom, fear, happy, neutral and sad. Our emotion recognition rate of 75.00% is again better than71.00% of the method of hidden Markov model with MFCC, delta MFCC, cepstral coefficient and speech energy.

References

Page 1

	Year	Citations

Page 1