Meta-Classifiers in Acoustic and Linguistic Feature Fusion-Based Affect Recognition

Abstract

We suggest a novel approach to affect recognition based on acoustic and linguistic analysis of spoken utterances. In order to achieve maximum discrimination power within robust integration of these information sources, a fusion on the feature level is introduced. Considering classification, we use meta-classifiers, such as StackingC and Boosting, for a stabilized performance, and a combination of classifiers within ensembles. Extensive comparisons of diverse base-classifiers, including support vector machines, neural networks, stochastic models, and decision trees, are fulfilled. 381 acoustic features are extracted and their relevance is calculated by a sequential forward floating search in comparison to reduction by principal component analysis. Several variants for linguistic feature calculation are described and ranked, including bunch-of-words, n-grams, salience, and mutual information. Furthermore, reduction by stopping and stemming or filter-based selection methods is evaluated, reducing 2,334 linguistic features. Seven discrete emotions described in the MPEG-4 standard are recognized within an existing recognition engine. The presented results are based on two large databases of 4,336 acted and real emotion samples from movies, chat and car interaction dialogues. A significant gain and an outstanding overall performance are observed by this novel fusion and use of ensembles.

References

Page 1

	Year	Citations

Page 1