PPGAN: Privacy-Preserving Generative Adversarial Network

TLDR

Generative Adversarial Networks can produce high‑quality synthetic data but tend to memorize training samples, raising privacy concerns when applied to sensitive information such as medical records. This work introduces a Privacy‑Preserving GAN (PPGAN) that incorporates differential privacy by adding carefully calibrated noise to the model gradients during training. PPGAN employs a Moments Accountant to manage privacy loss and includes a formal proof that its discriminator satisfies differential privacy guarantees. Experiments on benchmark datasets show that PPGAN generates synthetic data of comparable quality to non‑private GANs while maintaining a reasonable privacy budget.

Abstract

Generative Adversarial Network (GAN) and its variants serve as a perfect representation of the data generation model, providing researchers with a large amount of high-quality generated data. They illustrate a promising direction for research with limited data availability. When GAN learns the semantic-rich data distribution from a dataset, the density of the generated distribution tends to concentrate on the training data. Due to the gradient parameters of the deep neural network contain the data distribution of the training samples, they can easily remember the training samples. When GAN is applied to private or sensitive data, for instance, patient medical records, as private information may be leakage. To address this issue, we propose a Privacy-preserving Generative Adversarial Network (PPGAN) model, in which we achieve differential privacy in GANs by adding well-designed noise to the gradient during the model learning procedure. Besides, we introduced the Moments Accountant strategy in the PPGAN training process to improve the stability and compatibility of the model by controlling privacy loss. We also give a mathematical proof of the differential privacy discriminator. Through extensive case studies of the benchmark datasets, we demonstrate that PPGAN can generate high-quality synthetic data while retaining the required data available under a reasonable privacy budget.

References

Page 1

	Year	Citations

Page 1