Publication | Closed Access
Speech emotion recognition with acoustic and lexical features
174
Citations
21
References
2015
Year
Unknown Venue
EngineeringMachine LearningSpeech CorpusMultimodal Sentiment AnalysisSocial SciencesText MiningSpeech RecognitionNatural Language ProcessingData SciencePattern RecognitionAffective ComputingEmotion VectorEmotion Recognition SystemSpeech Emotion RecognitionComputer ScienceDeep LearningSpeech CommunicationSpeech AnalysisFacial Expression RecognitionSpeech ProcessingSpeech PerceptionEmotionLinguisticsEmotion Recognition
In this paper we explore one of the key aspects in building an emotion recognition system: generating suitable feature representations. We generate feature representations from both acoustic and lexical levels. At the acoustic level, we first extract low-level features such as intensity, F0, jitter, shimmer and spectral contours etc. We then generate different acoustic feature representations based on these low-level features, including statistics over these features, a new representation derived from a set of low-level acoustic codewords, and a new representation from Gaussian Supervectors. At the lexical level, we propose a new feature representation named emotion vector (eVector). We also use the traditional Bag-of-Words (BoW) feature. We apply these feature representations for emotion recognition and compare their performance on the USC-IEMOCAP database. We also combine these different feature representations via early fusion and late fusion. Our experimental results show that late fusion of both acoustic and lexical features achieves four-class emotion recognition accuracy of 69.2%.
| Year | Citations | |
|---|---|---|
Page 1
Page 1