Publication | Closed Access
Multi-modal emotion recognition using semi-supervised learning and multiple neural networks in the wild
44
Citations
37
References
2017
Year
Unknown Venue
EngineeringMachine LearningAffective NeuroscienceMultimodal LearningMultimodal Sentiment AnalysisSocial SciencesAudio SignalsImage AnalysisData SciencePattern RecognitionFusion LearningAffective ComputingHuman EmotionsSemi-supervised LearningMulti-modal Emotion RecognitionMachine VisionMultimodal Signal ProcessingDeep LearningHuman Emotion RecognitionComputer VisionFacial Expression RecognitionMultiple Neural NetworksEmotionEmotion Recognition
Human emotion recognition is a research topic that is receiving continuous attention in computer vision and artificial intelligence domains. This paper proposes a method for classifying human emotions through multiple neural networks based on multi-modal signals which consist of image, landmark, and audio in a wild environment. The proposed method has the following features. First, the learning performance of the image-based network is greatly improved by employing both multi-task learning and semi-supervised learning using the spatio-temporal characteristic of videos. Second, a model for converting 1-dimensional (1D) landmark information of face into two-dimensional (2D) images, is newly proposed, and a CNN-LSTM network based on the model is proposed for better emotion recognition. Third, based on an observation that audio signals are often very effective for specific emotions, we propose an audio deep learning mechanism robust to the specific emotions. Finally, so-called emotion adaptive fusion is applied to enable synergy of multiple networks. In the fifth attempt on the given test set in the EmotiW2017 challenge, the proposed method achieved a classification accuracy of 57.12%.
| Year | Citations | |
|---|---|---|
Page 1
Page 1