Convolutional MKL Based Multimodal Emotion Recognition and Sentiment Analysis

TLDR

The explosive growth of multimodal online content, driven by ubiquitous devices and video‑enabled wearables, makes mining useful knowledge—especially for sentiment analysis—an increasingly urgent challenge. The authors aim to build a multimodal emotion recognition and sentiment analysis system by extracting visual and textual features. They extract these features with deep convolutional neural networks and feed them into a multiple kernel learning classifier. This approach significantly outperforms existing multimodal emotion recognition and sentiment analysis methods across several datasets.

Abstract

Technology has enabled anyone with an Internet connection to easily create and share their ideas, opinions and content with millions of other people around the world. Much of the content being posted and consumed online is multimodal. With billions of phones, tablets and PCs shipping today with built-in cameras and a host of new video-equipped wearables like Google Glass on the horizon, the amount of video on the Internet will only continue to increase. It has become increasingly difficult for researchers to keep up with this deluge of multimodal content, let alone organize or make sense of it. Mining useful knowledge from video is a critical need that will grow exponentially, in pace with the global growth of content. This is particularly important in sentiment analysis, as both service and product reviews are gradually shifting from unimodal to multimodal. We present a novel method to extract features from visual and textual modalities using deep convolutional neural networks. By feeding such features to a multiple kernel learning classifier, we significantly outperform the state of the art of multimodal emotion recognition and sentiment analysis on different datasets.

References

Page 1

	Year	Citations

Page 1