Title-and-Tag Contrastive Vision-and-Language Transformer for Social Media Popularity Prediction

Abstract

Social media is an indispensable part of modern life, and social media popularity prediction (SMPP) plays a vital role in practice. In current work, the inconsistency of words in labels and titles, user feature transformation, etc have not been well noticed. In this paper, we propose a novel approach named Title-and-Tag Contrastive Vision-and-Language Transformer (TTC-VLT), combining two pre-trained vision and language transformers and other two dense feature parts for this prediction task. On one hand, in order to learn the differences between titles and tags, we design title-tag contrastive learning for title-visual and tag-visual, which separately extracts multimodal information from two types of text. On the other hand, user identification features are transformed to embedding vectors to capture user attribute details. From the extensive experiments, our approach outperforms the other methods on the social media prediction dataset. Our team achieve the 2nd place on the leader board of the Social Media Prediction Challenge 2022.

References

Page 1

	Year	Citations

Page 1