Human-Like Emotion Recognition: Multi-Label Learning from Noisy Labeled Audio-Visual Expressive Speech

Abstract

To capture variation in categorical emotion recognition by human perceivers, we propose a multi-label learning and evaluation method that can employ the distribution of emotion labels generated by every human annotator. In contrast to the traditional accuracy-based performance measure for categorical emotion labels, our proposed learning and inference algorithms use cross entropy to directly compare human and machine emotion label distributions. Our audiovisual emotion recognition experiments demonstrate that emotion recognition can benefit from using a multi-label representation that fully uses both clear and ambiguous emotion data. Further, the results demonstrate that this emotion recognition system can (i) learn the distribution of human annotators directly; (ii) capture the humanlike label noise in emotion perception; and (iii) identify infrequent or uncommon emotional expression (such as frustration) from inconsistently labeled emotion data, which were often ignored in previous emotion recognition systems.

References

Page 1

	Year	Citations

Page 1