Publication | Closed Access
Multi-modal Fusion Using Spatio-temporal and Static Features for Group Emotion Recognition
14
Citations
9
References
2020
Year
Unknown Venue
EngineeringMachine LearningBiometricsOptical FlowMultimodal Sentiment AnalysisEmotiw 2020Social SciencesVideo InterpretationMultilevel FusionImage AnalysisPattern RecognitionStatic FeaturesAffective ComputingMulti-modal Fusion UsingVideo TransformerGroup Emotion RecognitionMachine VisionRgb DifferenceMultimodal Signal ProcessingVideo UnderstandingDeep LearningFeature FusionComputer VisionFacial Expression RecognitionEmotionEmotion Recognition
This paper presents our approach for Audio-video Group Emotion Recognition sub-challenge in the EmotiW 2020. The task is to classify a video into one of the group emotions such as positive, neutral, and negative. Our approach exploits two different feature levels for this task, spatio-temporal feature and static feature level. In spatio-temporal feature level, we adopt multiple input modalities (RGB, RGB difference, optical flow, warped optical flow) into multiple video classification network to train the spatio-temporal model. In static feature level, we crop all faces and bodies in an image with the state-of the-art human pose estimation method and train kinds of CNNs with the image-level labels of group emotions. Finally, we fuse all 14 models result together, and achieve the third place in this sub-challenge with classification accuracies of 71.93% and 70.77% on the validation set and test set, respectively.
| Year | Citations | |
|---|---|---|
Page 1
Page 1