DP-RDM: Adapting Diffusion Models to Private Domains Without Fine-Tuning
Jonathan Lebensold, Maziar Sanjabi, Pietro Astolfi, Adriana Romero-Soriano, Kamalika Chaudhuri, Mike Rabbat, Chuan Guo
TL;DR
DP-RDM addresses the privacy risks of retrieval-augmented diffusion by introducing a differentially private retrieval mechanism that augments prompts with privatized samples without requiring fine-tuning. The approach combines private $k$-NN retrieval with noisy aggregation and query interpolation to balance privacy and image quality, backed by a Rényi-DP analysis and practical DP guarantees for multiple queries. Empirically, DP-RDM achieves competitive quality under DP on CIFAR-10, MS-COCO, and Shutterstock, with notable gains on MS-COCO (e.g., $\epsilon=10$ and FID improvements from $14.4$ to $10.9$) as the private retrieval dataset scales. The work demonstrates that large-scale private retrieval can enable domain adaptation for diffusion models while maintaining rigorous privacy, potentially broadening privacy-preserving deployment of generative systems in sensitive domains.
Abstract
Text-to-image diffusion models have been shown to suffer from sample-level memorization, possibly reproducing near-perfect replica of images that they are trained on, which may be undesirable. To remedy this issue, we develop the first differentially private (DP) retrieval-augmented generation algorithm that is capable of generating high-quality image samples while providing provable privacy guarantees. Specifically, we assume access to a text-to-image diffusion model trained on a small amount of public data, and design a DP retrieval mechanism to augment the text prompt with samples retrieved from a private retrieval dataset. Our \emph{differentially private retrieval-augmented diffusion model} (DP-RDM) requires no fine-tuning on the retrieval dataset to adapt to another domain, and can use state-of-the-art generative models to generate high-quality image samples while satisfying rigorous DP guarantees. For instance, when evaluated on MS-COCO, our DP-RDM can generate samples with a privacy budget of $ε=10$, while providing a $3.5$ point improvement in FID compared to public-only retrieval for up to $10,000$ queries.
