Bimodal Fusion of Emotional Data in an Automotive Environment

Abstract

We present a flexible bimodal approach to person dependent emotion recognition in an automotive environment by adapting an acoustic and a visual monomodal recognizer and combining the individual results on an abstract decision level. The reference database consists of 840 acted audiovisual examples of seven different speakers, expressing the three emotions, positive (joy), negative (anger, irritation) and neutral. Concerning the acoustic module, we calculate the statistics of commonly known low-level features. Facial expressions are evaluated by an SVM classification of Gabor-filtered face regions. At the subsequent integration stage, both monomodal decisions are fused by a weighted linear combination. An evaluation of the recorded examples yields an average recognition rate of 90.7% for the fusion approach. This adds up to a performance gain of nearly 4% compared to the best monomodal recognizer. The system is currently used to improve the usability for automotive infotainment interfaces.

References

Page 1

	Year	Citations

Page 1