DMF-GAN: Deep Multimodal Fusion Generative Adversarial Networks for Text-to-Image Synthesis

Abstract

Text-to-image synthesis aims to generate highquality realistic images conditioned on text description. The great challenge of this task depends on deeply and seamlessly integrating image and text information. Thus, in this paper, we propose a deep multimodal fusion generative adversarial networks (DMF-GAN) that allows effective semantic interactions for finegrained text-to-image generation. Specifically, through a novel recurrent semantic fusion network, DMF-GAN could consistently manipulate global assignment of text information among isolated fusion blocks. With the assistance of a multi-head attention module, DMF-GAN could model word information from different perspectives and further improve the semantic consistency. In addition, a word-level discriminator is proposed to provide the generator with fine-grained feedback related to each word. Compared with current state-of-the-art methods, our proposed DMFGAN could efficiently synthesize realistic and text-alignment images and achieve better performance on challenging benchmarks. The code link: <uri xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">https://github.com/xueqinxiang/DMF-GAN</uri>

References

Page 1

	Year	Citations

Page 1