DP-LDMs: Differentially Private Latent Diffusion Models
Michael F. Liu, Saiyue Lyu, Margarita Vinaroz, Mijung Park
TL;DR
The paper tackles the privacy risks of diffusion models by introducing DP-LDMs, which pretrain autoencoders and latent diffusion models on public data and privately fine-tune only the attention modules (and conditioning embedder) with DP-SGD on private data. This two-stage approach dramatically reduces trainable parameters while delivering high-resolution, conditioned image generation with differential privacy guarantees. Across multiple benchmarks, DP-LDMs achieve competitive or superior DP-utility trade-offs (as measured by FID and downstream task performance) and require substantially fewer computational resources than full-model DP fine-tuning. The work demonstrates that attention-level adaptation in latent diffusion spaces can effectively bridge domain shifts under DP constraints, offering a practical path for private generative modeling at scale.
Abstract
Diffusion models (DMs) are one of the most widely used generative models for producing high quality images. However, a flurry of recent papers points out that DMs are least private forms of image generators, by extracting a significant number of near-identical replicas of training images from DMs. Existing privacy-enhancing techniques for DMs, unfortunately, do not provide a good privacy-utility tradeoff. In this paper, we aim to improve the current state of DMs with differential privacy (DP) by adopting the $\textit{Latent}$ Diffusion Models (LDMs). LDMs are equipped with powerful pre-trained autoencoders that map the high-dimensional pixels into lower-dimensional latent representations, in which DMs are trained, yielding a more efficient and fast training of DMs. Rather than fine-tuning the entire LDMs, we fine-tune only the $\textit{attention}$ modules of LDMs with DP-SGD, reducing the number of trainable parameters by roughly $90\%$ and achieving a better privacy-accuracy trade-off. Our approach allows us to generate realistic, high-dimensional images (256x256) conditioned on text prompts with DP guarantees, which, to the best of our knowledge, has not been attempted before. Our approach provides a promising direction for training more powerful, yet training-efficient differentially private DMs, producing high-quality DP images. Our code is available at https://anonymous.4open.science/r/DP-LDM-4525.
