Publication | Closed Access
Object-Driven Text-To-Image Synthesis via Adversarial Training
313
Citations
26
References
2019
Year
Unknown Venue
Natural Language ProcessingEngineeringMachine LearningGenerative Adversarial NetworkImage SynthesisVision Language ModelGenerative ModelsComplex ImagesVisual Question AnsweringGenerative AiAdversarial TrainingDeep LearningText DescriptionsSynthesized ObjectComputer VisionMachine TranslationSynthetic Image Generation
The paper introduces Object‑driven Attentive Generative Adversarial Networks (Obj‑GANs) that enable attention‑driven, multi‑stage refinement for synthesizing complex images from text descriptions. Obj‑GAN uses a novel object‑driven attentive generative network that focuses on relevant words and class labels to synthesize salient objects, and an object‑wise Fast R‑CNN discriminator that provides rich discrimination signals, with a comparison to classic grid attention illustrating how the model generates high‑quality complex scenes. On the MS‑COCO benchmark, Obj‑GAN improves the inception score by 27% and reduces the FID score by 11%, and analysis of its attention layers shows how the model produces high‑quality complex scenes.
In this paper, we propose Object-driven Attentive Generative Adversarial Newtorks (Obj-GANs) that allow attention-driven, multi-stage refinement for synthesizing complex images from text descriptions. With a novel object-driven attentive generative network, the Obj-GAN can synthesize salient objects by paying attention to their most relevant words in the text descriptions and their pre-generated class label. In addition, a novel object-wise discriminator based on the Fast R-CNN model is proposed to provide rich object-wise discrimination signals on whether the synthesized object matches the text description and the pre-generated class label. The proposed Obj-GAN significantly outperforms the previous state of the art in various metrics on the large-scale MS-COCO benchmark, increasing the inception score by 27% and decreasing the FID score by 11%. A thorough comparison between the classic grid attention and the new object-driven attention is provided through analyzing their mechanisms and visualizing their attention layers, showing insights of how the proposed model generates complex scenes in high quality.
| Year | Citations | |
|---|---|---|
Page 1
Page 1