Interpreting the Latent Space of GANs for Semantic Face Editing

TLDR

GANs can generate high‑fidelity images, yet the mapping from random latent codes to realistic outputs remains poorly understood, with prior work assuming a distributed latent space while noting vector‑arithmetic behavior. The authors introduce InterFaceGAN, a framework that interprets latent semantics for semantic face editing and investigates how different attributes are encoded in GAN latent space. InterFaceGAN is applied to real‑image manipulation by combining it with GAN inversion techniques or encoder‑involved models. The study shows that well‑trained GANs learn a disentangled latent representation after linear transformations, enabling precise control over attributes such as gender, age, expression, eyeglasses, pose, and artifact correction, and that this disentanglement arises naturally during face synthesis.

Abstract

Despite the recent advance of Generative Adversarial Networks (GANs) in high-fidelity image synthesis, there lacks enough understanding of how GANs are able to map a latent code sampled from a random distribution to a photo-realistic image. Previous work assumes the latent space learned by GANs follows a distributed representation but observes the vector arithmetic phenomenon. In this work, we propose a novel framework, called InterFaceGAN, for semantic face editing by interpreting the latent semantics learned by GANs. In this framework, we conduct a detailed study on how different semantics are encoded in the latent space of GANs for face synthesis. We find that the latent code of well-trained generative models actually learns a disentangled representation after linear transformations. We explore the disentanglement between various semantics and manage to decouple some entangled semantics with subspace projection, leading to more precise control of facial attributes. Besides manipulating gender, age, expression, and the presence of eyeglasses, we can even vary the face pose as well as fix the artifacts accidentally generated by GAN models. The proposed method is further applied to achieve real image manipulation when combined with GAN inversion methods or some encoder-involved models. Extensive results suggest that learning to synthesize faces spontaneously brings a disentangled and controllable facial attribute representation.

References

Page 1

	Year	Citations

Page 1