Deep learning for robust feature generation in audiovisual emotion recognition

TLDR

Automatic emotion recognition systems predict affective content from low‑level human‑centered cues, yet most feature‑selection methods capture only linear relationships or rely on labeled data. The study aims to use deep learning to capture complex non‑linear feature interactions across multimodal data for emotion recognition. The authors employ deep learning techniques to explicitly model high‑order non‑linear relationships in multimodal signals. The Deep Belief Network models outperform non‑deep‑learning baselines, showing that learned high‑order non‑linear relationships improve emotion classification.

Abstract

Automatic emotion recognition systems predict high-level affective content from low-level human-centered signal cues. These systems have seen great improvements in classification accuracy, due in part to advances in feature selection methods. However, many of these feature selection methods capture only linear relationships between features or alternatively require the use of labeled data. In this paper we focus on deep learning techniques, which can overcome these limitations by explicitly capturing complex non-linear feature interactions in multimodal data. We propose and evaluate a suite of Deep Belief Network models, and demonstrate that these models show improvement in emotion classification performance over baselines that do not employ deep learning. This suggests that the learned high-order non-linear relationships are effective for emotion recognition.

References

Page 1

	Year	Citations

Page 1