Differentially Private Diffusion Models
Tim Dockhorn, Tianshi Cao, Arash Vahdat, Karsten Kreis
TL;DR
The paper tackles privacy-preserving generative modeling by marrying diffusion models with differential privacy using DP-SGD, introducing DPDMs. It identifies diffusion-model parameterization and sampling as crucial in the DP setting and proposes noise multiplicity to reduce gradient variance without increasing privacy cost. Empirical results show DPDMs achieve state-of-the-art DP image synthesis on standard benchmarks and that classifiers trained on DPDM-generated data can match or exceed performance of DP-SGD-trained discriminators on real data. This work suggests DPDMs are a practical and scalable approach for private data sharing and downstream learning across privacy regimes.
Abstract
While modern machine learning models rely on increasingly large training datasets, data is often limited in privacy-sensitive domains. Generative models trained with differential privacy (DP) on sensitive data can sidestep this challenge, providing access to synthetic data instead. We build on the recent success of diffusion models (DMs) and introduce Differentially Private Diffusion Models (DPDMs), which enforce privacy using differentially private stochastic gradient descent (DP-SGD). We investigate the DM parameterization and the sampling algorithm, which turn out to be crucial ingredients in DPDMs, and propose noise multiplicity, a powerful modification of DP-SGD tailored to the training of DMs. We validate our novel DPDMs on image generation benchmarks and achieve state-of-the-art performance in all experiments. Moreover, on standard benchmarks, classifiers trained on DPDM-generated synthetic data perform on par with task-specific DP-SGD-trained classifiers, which has not been demonstrated before for DP generative models. Project page and code: https://nv-tlabs.github.io/DPDM.
