Publication | Open Access
VQGAN-CLIP: Open Domain Image Generation and Editing with Natural Language Guidance
30
Citations
0
References
2022
Year
Natural Language ProcessingMultimodal LlmNovel MethodologyImage AnalysisMachine LearningHigh Visual QualityEngineeringGenerative Adversarial NetworkNatural Language GuidanceText PromptsVision Language ModelComputer ScienceHuman Image SynthesisGenerative AiDeep LearningComputer VisionMachine TranslationSynthetic Image Generation
Generating and editing images from open domain text prompts is a challenging task that heretofore has required expensive and specially trained models. We demonstrate a novel methodology for both tasks which is capable of producing images of high visual quality from text prompts of significant semantic complexity without any training by using a multimodal encoder to guide image generations. We demonstrate on a variety of tasks how using CLIP [37] to guide VQGAN [11] produces higher visual quality outputs than prior, less flexible approaches like DALL-E [38], GLIDE [33] and Open-Edit [24], despite not being trained for the tasks presented. Our code is available in a public repository.