Publication | Open Access
A multimodal fusion network with attention mechanisms for visual–textual sentiment analysis
44
Citations
43
References
2023
Year
Existing visual-textual sentiment analysis methods usually get poor performance due to limited utilization of the correlation between different modalities, i.e., they neglect the heterogeneity and homogeneity of different modalities. To overcome these limitations, we propose a Multimodal Fusion Network (called MFN) with a multi-head self-attention mechanism. MFN can minimize noise interference between different modalities through neural networks and attention mechanisms to obtain independent visual and textual features. Furthermore, it can exploit correlations between fine-grained local region feature representations from multimodal with different numbers of hidden neurons to leverage complementary information from heterogeneous visual and textual data. Extensive experiments show MFN outperforms the 11 state-of-the-art methods by at least 0.11%, 0.13%, and 0.38% on Twitter, Flickr, and Getty image datasets, respectively.
| Year | Citations | |
|---|---|---|
Page 1
Page 1