StarGAN v2: Diverse Image Synthesis for Multiple Domains

TLDR

Image‑to‑image translation models must map between visual domains while providing diverse outputs and scaling to many domains, yet existing methods either lack diversity or require separate models per domain. The authors introduce StarGAN v2, a unified framework that simultaneously achieves diverse, scalable image translation, and release the AFHQ animal‑faces dataset to facilitate evaluation. StarGAN v2 employs a single generative architecture that learns mappings across multiple domains while generating diverse outputs, outperforming baseline models. Experiments on CelebA‑HQ and AFHQ demonstrate that StarGAN v2 achieves higher visual quality, greater diversity, and better scalability than baselines, with code and pretrained models publicly released.

Abstract

A good image-to-image translation model should learn a mapping between different visual domains while satisfying the following properties: 1) diversity of generated images and 2) scalability over multiple domains. Existing methods address either of the issues, having limited diversity or multiple models for all domains. We propose StarGAN v2, a single framework that tackles both and shows significantly improved results over the baselines. Experiments on CelebA-HQ and a new animal faces dataset (AFHQ) validate our superiority in terms of visual quality, diversity, and scalability. To better assess image-to-image translation models, we release AFHQ, high-quality animal faces with large inter- and intra-domain differences. The code, pretrained models, and dataset are available at https://github.com/clovaai/stargan-v2.

References

Page 1

	Year	Citations

Page 1