CH-SIMS: A Chinese Multimodal Sentiment Analysis Dataset with Fine-grained Annotation of Modality

TLDR

Existing multimodal sentiment datasets are limited and use unified annotations that fail to capture independent modality sentiment, restricting model performance. This work introduces CH‑SIMS, a Chinese multimodal sentiment dataset with 2,281 video segments annotated for both multimodal and independent unimodal sentiment, enabling studies of modality interaction and unimodal analysis. We provide the CH‑SIMS dataset and a baseline multi‑task learning framework that employs late fusion to combine modalities. Experiments on CH‑SIMS demonstrate state‑of‑the‑art performance and more distinctive unimodal representations, and the dataset and code are publicly available.

Abstract

Previous studies in multimodal sentiment analysis have used limited datasets, which only contain unified multimodal annotations. However, the unified annotations do not always reflect the independent sentiment of single modalities and limit the model to capture the difference between modalities. In this paper, we introduce a Chinese single- and multi-modal sentiment analysis dataset, CH-SIMS, which contains 2,281 refined video segments in the wild with both multimodal and independent unimodal annotations. It allows researchers to study the interaction between modalities or use independent unimodal annotations for unimodal sentiment analysis.Furthermore, we propose a multi-task learning framework based on late fusion as the baseline. Extensive experiments on the CH-SIMS show that our methods achieve state-of-the-art performance and learn more distinctive unimodal representations. The full dataset and codes are available for use at https://github.com/thuiar/MMSA.

References

Page 1

	Year	Citations

Page 1