Publication | Open Access
Multimodal Emotion Recognition with High-Level Speech and Text Features
74
Citations
39
References
2021
Year
EngineeringMachine LearningAffective NeuroscienceMultimodal Sentiment AnalysisSocial SciencesSpeech RecognitionNatural Language ProcessingAutomatic Emotion RecognitionData SciencePattern RecognitionAffective ComputingMultimodal Emotion RecognitionIemocap DatasetMultimodal Signal ProcessingDeep LearningSpeech CommunicationSpeech AnalysisMulti-speaker Speech RecognitionSpeech ProcessingSpeech InputSpeech PerceptionEmotionEmotion Recognition
Automatic emotion recognition is one of the central concerns of the Human-Computer Interaction field as it can bridge the gap between humans and machines. Current works train deep learning models on low-level data representations to solve the emotion recognition task. Since emotion datasets often have a limited amount of data, these approaches may suffer from overfitting, and they may learn based on superficial cues. To address these issues, we propose a novel cross-representation speech model, inspired by disentangle-ment representation learning, to perform emotion recognition on wav2vec 2.0 speech features. We also train a CNN-based model to recognize emotions from text features extracted with Transformer-based models. We further combine the speech-based and text-based results with a score fusion approach. Our method is evaluated on the IEMOCAP dataset in a 4-class classification problem, and it surpasses current works on speech-only, text-only, and multimodal emotion recognition.
| Year | Citations | |
|---|---|---|
Page 1
Page 1