An unsupervised hybrid model based on CNN and ViT for multimodal medical image fusion

Abstract

The multimodal medical image fusion (MMIF) technology aims to integrate complementary information from multiple different modal images into an image. The existing fusion methods cannot capture the global context information effectively, thus the fusion performance is limited. In this paper, an unsupervised hybrid model based on CNN and vision transformer (ViT) for multimodal medical image fusion is proposed. Our model combines the advantages of the CNN and ViT model, it can not only capture the global context information and bring stronger learning ability, but also introduce the inductive bias of CNN to improve the generalization performance. Furthermore, a novel complementarity information fidelity loss is proposed, it performs a constraint for enhanced preservation of complementarity information in fused results, hence, the better fusion effect can be obtained. Both qualitative and quantitative experiments demonstrate the superiority of our method over the state-of-the-art fusion methods.

References

Page 1

	Year	Citations

Page 1