Table of Contents
Fetching ...

Differentially Private Adaptation of Diffusion Models via Noisy Aggregated Embeddings

Pura Peetathawatchai, Wei-Ning Chen, Berivan Isik, Sanmi Koyejo, Albert No

TL;DR

The paper tackles privacy risks in personalizing diffusion models on small, sensitive datasets by proposing DPAgg-TI, which privately aggregates per-image TI embeddings to adapt generation without full model fine-tuning. By learning separate embeddings for each image and aggregating them into a noisy centroid, DPAgg-TI achieves formal $(\varepsilon,\delta)$-DP guarantees and preserves stylistic fidelity much better than DP-SGD under the same privacy budget, as demonstrated on private artwork and Paris 2024 pictograms. The approach leverages subsampling to amplify privacy and uses normalization to bound sensitivity, enabling efficient, modular adaptation with competitive outputs close to non-private baselines. Experimental results include perceptual user studies, KID analyses, and an ablation comparing to DP-SGD, highlighting the method’s robustness in low-data regimes and its practical implications for privacy-preserving style transfer in diffusion models.

Abstract

Personalizing large-scale diffusion models poses serious privacy risks, especially when adapting to small, sensitive datasets. A common approach is to fine-tune the model using differentially private stochastic gradient descent (DP-SGD), but this suffers from severe utility degradation due to the high noise needed for privacy, particularly in the small data regime. We propose an alternative that leverages Textual Inversion (TI), which learns an embedding vector for an image or set of images, to enable adaptation under differential privacy (DP) constraints. Our approach, Differentially Private Aggregation via Textual Inversion (DPAgg-TI), adds calibrated noise to the aggregation of per-image embeddings to ensure formal DP guarantees while preserving high output fidelity. We show that DPAgg-TI outperforms DP-SGD finetuning in both utility and robustness under the same privacy budget, achieving results closely matching the non-private baseline on style adaptation tasks using private artwork from a single artist and Paris 2024 Olympic pictograms. In contrast, DP-SGD fails to generate meaningful outputs in this setting.

Differentially Private Adaptation of Diffusion Models via Noisy Aggregated Embeddings

TL;DR

The paper tackles privacy risks in personalizing diffusion models on small, sensitive datasets by proposing DPAgg-TI, which privately aggregates per-image TI embeddings to adapt generation without full model fine-tuning. By learning separate embeddings for each image and aggregating them into a noisy centroid, DPAgg-TI achieves formal -DP guarantees and preserves stylistic fidelity much better than DP-SGD under the same privacy budget, as demonstrated on private artwork and Paris 2024 pictograms. The approach leverages subsampling to amplify privacy and uses normalization to bound sensitivity, enabling efficient, modular adaptation with competitive outputs close to non-private baselines. Experimental results include perceptual user studies, KID analyses, and an ablation comparing to DP-SGD, highlighting the method’s robustness in low-data regimes and its practical implications for privacy-preserving style transfer in diffusion models.

Abstract

Personalizing large-scale diffusion models poses serious privacy risks, especially when adapting to small, sensitive datasets. A common approach is to fine-tune the model using differentially private stochastic gradient descent (DP-SGD), but this suffers from severe utility degradation due to the high noise needed for privacy, particularly in the small data regime. We propose an alternative that leverages Textual Inversion (TI), which learns an embedding vector for an image or set of images, to enable adaptation under differential privacy (DP) constraints. Our approach, Differentially Private Aggregation via Textual Inversion (DPAgg-TI), adds calibrated noise to the aggregation of per-image embeddings to ensure formal DP guarantees while preserving high output fidelity. We show that DPAgg-TI outperforms DP-SGD finetuning in both utility and robustness under the same privacy budget, achieving results closely matching the non-private baseline on style adaptation tasks using private artwork from a single artist and Paris 2024 Olympic pictograms. In contrast, DP-SGD fails to generate meaningful outputs in this setting.

Paper Structure

This paper contains 30 sections, 15 equations, 14 figures, 5 tables.

Figures (14)

  • Figure 1: We compare our method (DPAgg-TI, top) to a baseline applying DP-SGD to Textual Inversion (bottom), using the prompt “an icon of the Eiffel Tower in the style of the Paris 2024 Olympic Pictograms.” While the baseline learns a single embedding over the dataset, our method privately aggregates per-image embeddings. At privacy budget $\varepsilon = 1$, DPAgg-TI preserves visual fidelity much better than the baseline, and closely matches the non-private output (left), demonstrating a superior privacy-utility tradeoff.
  • Figure 2: Overview of DPAgg-TI. We first apply Textual Inversion to extract embeddings for each image in the private dataset. These embeddings are then aggregated with differentially private mechansim, incorporating subsampling to produce a private embedding $u^*_{\text{DP}}$. Finally, images are generated using the corresponding token $S^*$.
  • Figure 3: Samples of images used in our style adaptation experiments. Left: artwork by @eveismyname ($n = 158$). Right: Paris 2024 Olympic pictograms ($n = 47$), © International Olympic Committee, 2023.
  • Figure 4: Images generated by Stable Diffusion v1.5 using the prompt "A painting of Taylor Swift in the style of $<$@eveismyname$>$", with the embedding $<$@eveismyname$>$ trained using different values of $m$ and $\varepsilon$.
  • Figure 5: Images generated by Stable Diffusion v1.5 using the prompt "Icon of a dragon in the style of $<$Paris 2024 Pictograms$>$", with the embedding $<$Paris 2024 Pictograms$>$ trained using different values of $m$ and $\varepsilon$.
  • ...and 9 more figures

Theorems & Definitions (4)

  • Definition 1: (Approximate) Differential Privacy
  • Definition 2: Near Access-Freeness vyas2023provable
  • Definition 3: Differentially Private Generation (DPG)
  • Remark 1: Scope of Protection and Artist-Level Extension