Speech Emotion Recognition Using Fourier Parameters

TLDR

Harmony features have recently been explored for speech emotion recognition. The authors propose a new Fourier parameter model that incorporates perceptual voice quality and first‑ and second‑order differences for speaker‑independent speech emotion recognition. The model derives Fourier parameters from perceptual voice quality and the first‑ and second‑order differences of harmony features. The study shows that first‑ and second‑order differences of harmony features are crucial for emotion recognition, and that the proposed Fourier parameter features outperform MFCC‑based methods by 6.8–16.6 points on EMODB, CASIA, and EESDB, with further gains of 10–17.5 points when combined with MFCC.

Abstract

Recently, studies have been performed on harmony features for speech emotion recognition. It is found in our study that the first- and second-order differences of harmony features also play an important role in speech emotion recognition. Therefore, we propose a new Fourier parameter model using the perceptual content of voice quality and the first- and second-order differences for speaker-independent speech emotion recognition. Experimental results show that the proposed Fourier parameter (FP) features are effective in identifying various emotional states in speech signals. They improve the recognition rates over the methods using Mel frequency cepstral coefficient (MFCC) features by 16.2, 6.8 and 16.6 points on the German database (EMODB), Chinese language database (CASIA) and Chinese elderly emotion database (EESDB). In particular, when combining FP with MFCC, the recognition rates can be further improved on the aforementioned databases by 17.5, 10 and 10.5 points, respectively.

References

Page 1

	Year	Citations

Page 1