Table of Contents
Fetching ...

Perturb a Model, Not an Image: Towards Robust Privacy Protection via Anti-Personalized Diffusion Models

Tae-Young Lee, Juwon Seo, Jong Hwan Ko, Gyeong-Moon Park

TL;DR

This work tackles privacy risks from personalized diffusion models by shifting protection from data to the model itself. It introduces Anti-Personalized Diffusion Models (APDM), combining Direct Protective Optimization (DPO) with Learning to Protect (L2P) to disrupt subject personalization while preserving generative quality. Theoretical analysis shows naïve loss formulations fail to converge, motivating DPO and a trajectory-aware protection strategy. Empirical results demonstrate state-of-the-art protection across diverse subjects and scenarios, while maintaining high-quality generation and the ability to personalize other, non-targeted subjects, highlighting practical, provider-friendly privacy safeguards for real-world diffusion systems.

Abstract

Recent advances in diffusion models have enabled high-quality synthesis of specific subjects, such as identities or objects. This capability, while unlocking new possibilities in content creation, also introduces significant privacy risks, as personalization techniques can be misused by malicious users to generate unauthorized content. Although several studies have attempted to counter this by generating adversarially perturbed samples designed to disrupt personalization, they rely on unrealistic assumptions and become ineffective in the presence of even a few clean images or under simple image transformations. To address these challenges, we shift the protection target from the images to the diffusion model itself to hinder the personalization of specific subjects, through our novel framework called Anti-Personalized Diffusion Models (APDM). We first provide a theoretical analysis demonstrating that a naive approach of existing loss functions to diffusion models is inherently incapable of ensuring convergence for robust anti-personalization. Motivated by this finding, we introduce Direct Protective Optimization (DPO), a novel loss function that effectively disrupts subject personalization in the target model without compromising generative quality. Moreover, we propose a new dual-path optimization strategy, coined Learning to Protect (L2P). By alternating between personalization and protection paths, L2P simulates future personalization trajectories and adaptively reinforces protection at each step. Experimental results demonstrate that our framework outperforms existing methods, achieving state-of-the-art performance in preventing unauthorized personalization. The code is available at https://github.com/KU-VGI/APDM.

Perturb a Model, Not an Image: Towards Robust Privacy Protection via Anti-Personalized Diffusion Models

TL;DR

This work tackles privacy risks from personalized diffusion models by shifting protection from data to the model itself. It introduces Anti-Personalized Diffusion Models (APDM), combining Direct Protective Optimization (DPO) with Learning to Protect (L2P) to disrupt subject personalization while preserving generative quality. Theoretical analysis shows naïve loss formulations fail to converge, motivating DPO and a trajectory-aware protection strategy. Empirical results demonstrate state-of-the-art protection across diverse subjects and scenarios, while maintaining high-quality generation and the ability to personalize other, non-targeted subjects, highlighting practical, provider-friendly privacy safeguards for real-world diffusion systems.

Abstract

Recent advances in diffusion models have enabled high-quality synthesis of specific subjects, such as identities or objects. This capability, while unlocking new possibilities in content creation, also introduces significant privacy risks, as personalization techniques can be misused by malicious users to generate unauthorized content. Although several studies have attempted to counter this by generating adversarially perturbed samples designed to disrupt personalization, they rely on unrealistic assumptions and become ineffective in the presence of even a few clean images or under simple image transformations. To address these challenges, we shift the protection target from the images to the diffusion model itself to hinder the personalization of specific subjects, through our novel framework called Anti-Personalized Diffusion Models (APDM). We first provide a theoretical analysis demonstrating that a naive approach of existing loss functions to diffusion models is inherently incapable of ensuring convergence for robust anti-personalization. Motivated by this finding, we introduce Direct Protective Optimization (DPO), a novel loss function that effectively disrupts subject personalization in the target model without compromising generative quality. Moreover, we propose a new dual-path optimization strategy, coined Learning to Protect (L2P). By alternating between personalization and protection paths, L2P simulates future personalization trajectories and adaptively reinforces protection at each step. Experimental results demonstrate that our framework outperforms existing methods, achieving state-of-the-art performance in preventing unauthorized personalization. The code is available at https://github.com/KU-VGI/APDM.

Paper Structure

This paper contains 44 sections, 2 theorems, 50 equations, 9 figures, 16 tables, 1 algorithm.

Key Result

Proposition 1

A necessary condition for $\mathcal{L}_{adv}$ to converge to a local minimum with respect to model parameters $\theta$ is that the gradients of its constituent terms, $\nabla_\theta \mathcal{L}_{simple}^{per}$ and $\nabla_\theta \mathcal{L}_{ppl}$, must point in the same direction.

Figures (9)

  • Figure 1: Motivation Figure. Existing protection approaches face critical limitations: (a) impracticality of applying data-poisoning to all images, (b) vulnerability to easy circumvention of protection methods, (c) high entry barriers for non-expert users, and (d) incompatibility with service providers who must comply with privacy regulations.
  • Figure 2: Overview. To prevent personalization in the parameter level, we propose Anti-Personalized Diffusion Model (APDM). (a) APDM first generates a paired image for each clean input image $x_0$. (b) APDM consists of two components - (i) Learning to Protect, a novel optimization algorithm that makes the protection procedure aware of personalization trajectories, and (ii) Directed Protective Optimization loss, designed to disrupt personalization while preserving the generation capabilities.
  • Figure 3: Qualitative Comparison on Protection. We examined the baselines and APDM on a protective aspect. We tested baselines on different circumstance - "All Perturbed", "One Clean", and "One Perturbed". In the "All Perturbed" setting, the baselines added perturbations to all training images. "One Clean" and "One Perturbed" settings are more difficult than "All Perturbed" setting, where the dataset contains one clean image or one perturbed image.
  • Figure 4: FID variation during the training with $\mathcal{L}_{adv}.$ We measured the image quality via FID score heusel2017gans on COCO 2014 lin2014microsoft validation dataset. We also plot the FID score of Stable Diffusion 1.5 and APDM.
  • Figure 5: Protection on other subjects. We attempted to protect personalization on "cat", "sneaker", "glasses", and "clock".
  • ...and 4 more figures

Theorems & Definitions (2)

  • Proposition 1
  • Theorem 1