Table of Contents
Fetching ...

Latent Diffusion Unlearning: Protecting Against Unauthorized Personalization Through Trajectory Shifted Perturbations

Naresh Kumar Devulapally, Shruti Agarwal, Tejas Gokhale, Vishnu Suresh Lokhande

TL;DR

This work tackles the risk of unauthorized personalization of text-to-image diffusion models by proposing a latent-space unlearning approach. It introduces a trajectory-shift perturbation implemented as a lightweight UNet that perturbs the terminal latent z_T during the inversion-denoising process in Latent Diffusion Models, enabling high-fidelity inputs that are difficult to personalize. The method optimizes a Lagrangian objective to maximize downstream personalization loss while enforcing an imperceptibility budget, achieving robust performance against purification attacks like DiffPure, and generalizing across diffusion pipelines and SD versions. Empirical results on four benchmarks show substantial improvements in perceptual quality (PSNR/SSIM/FID) and strong protection against TI/DB personalization, with practical training and inference efficiencies. This latent-space defense offers a scalable, model-agnostic safeguard for protecting personal data and IP in generative systems, aligning with evolving data-rights considerations.

Abstract

Text-to-image diffusion models have demonstrated remarkable effectiveness in rapid and high-fidelity personalization, even when provided with only a few user images. However, the effectiveness of personalization techniques has lead to concerns regarding data privacy, intellectual property protection, and unauthorized usage. To mitigate such unauthorized usage and model replication, the idea of generating ``unlearnable'' training samples utilizing image poisoning techniques has emerged. Existing methods for this have limited imperceptibility as they operate in the pixel space which results in images with noise and artifacts. In this work, we propose a novel model-based perturbation strategy that operates within the latent space of diffusion models. Our method alternates between denoising and inversion while modifying the starting point of the denoising trajectory: of diffusion models. This trajectory-shifted sampling ensures that the perturbed images maintain high visual fidelity to the original inputs while being resistant to inversion and personalization by downstream generative models. This approach integrates unlearnability into the framework of Latent Diffusion Models (LDMs), enabling a practical and imperceptible defense against unauthorized model adaptation. We validate our approach on four benchmark datasets to demonstrate robustness against state-of-the-art inversion attacks. Results demonstrate that our method achieves significant improvements in imperceptibility ($\sim 8 \% -10\%$ on perceptual metrics including PSNR, SSIM, and FID) and robustness ( $\sim 10\%$ on average across five adversarial settings), highlighting its effectiveness in safeguarding sensitive data.

Latent Diffusion Unlearning: Protecting Against Unauthorized Personalization Through Trajectory Shifted Perturbations

TL;DR

This work tackles the risk of unauthorized personalization of text-to-image diffusion models by proposing a latent-space unlearning approach. It introduces a trajectory-shift perturbation implemented as a lightweight UNet that perturbs the terminal latent z_T during the inversion-denoising process in Latent Diffusion Models, enabling high-fidelity inputs that are difficult to personalize. The method optimizes a Lagrangian objective to maximize downstream personalization loss while enforcing an imperceptibility budget, achieving robust performance against purification attacks like DiffPure, and generalizing across diffusion pipelines and SD versions. Empirical results on four benchmarks show substantial improvements in perceptual quality (PSNR/SSIM/FID) and strong protection against TI/DB personalization, with practical training and inference efficiencies. This latent-space defense offers a scalable, model-agnostic safeguard for protecting personal data and IP in generative systems, aligning with evolving data-rights considerations.

Abstract

Text-to-image diffusion models have demonstrated remarkable effectiveness in rapid and high-fidelity personalization, even when provided with only a few user images. However, the effectiveness of personalization techniques has lead to concerns regarding data privacy, intellectual property protection, and unauthorized usage. To mitigate such unauthorized usage and model replication, the idea of generating ``unlearnable'' training samples utilizing image poisoning techniques has emerged. Existing methods for this have limited imperceptibility as they operate in the pixel space which results in images with noise and artifacts. In this work, we propose a novel model-based perturbation strategy that operates within the latent space of diffusion models. Our method alternates between denoising and inversion while modifying the starting point of the denoising trajectory: of diffusion models. This trajectory-shifted sampling ensures that the perturbed images maintain high visual fidelity to the original inputs while being resistant to inversion and personalization by downstream generative models. This approach integrates unlearnability into the framework of Latent Diffusion Models (LDMs), enabling a practical and imperceptible defense against unauthorized model adaptation. We validate our approach on four benchmark datasets to demonstrate robustness against state-of-the-art inversion attacks. Results demonstrate that our method achieves significant improvements in imperceptibility ( on perceptual metrics including PSNR, SSIM, and FID) and robustness ( on average across five adversarial settings), highlighting its effectiveness in safeguarding sensitive data.

Paper Structure

This paper contains 44 sections, 13 equations, 12 figures, 3 tables.

Figures (12)

  • Figure 1: Training Pipeline: We train the parameters of unlearnable pertubation model '$\rho$' that perturbs the initial point of the denoising trajectory in the latent space to generate $\bar{z}^{\text{ul}}_T$ followed by shortcut diffusion model with (k=4). This is followed by a pre-trained, frozen personalization model ($\Phi^{\text{personalize}}$) to maximize $\mathcal{L}_{\text{personalize}}$.
  • Figure 2: Few-step Diffusion Models maintain data distribution integrity while allowing perturbations in reconstruction: Our method performs unlearnable sample generation by perturbing the noised latent $z_T$ followed a denoising model $\Phi^{\text{denoise}}$ to generate $\bar{z}^{\text{ul}}_0$. We empirically analyze the properties of $\Phi^{\text{denoise}}$ that allow meaningful perturbations $\Delta z_T$ to survive while preserving the underlying data distribution. Using a spiral dataset, we compare curve-fit error $e_L = \Delta(\text{Curve Fit}, \text{Data})$ and sample-level reconstruction error $e_R = \Delta(\Phi^{\text{denoise}}, \text{Data})$ across different denoising step counts $k$. Results show that multi-step denoising ($k=8$) minimizes both $e_L$ and $e_R$, suppressing perturbations and thus limiting adversarial behavior. In contrast, fewer denoising steps ($k=4$) enable high $e_R$ with low $e_L$, identifying a "feasible region” where latent perturbations persist through $\Phi^{\text{denoise}}$ while maintaining distributional integrity—critical for imperceptible unlearnable signal injection. This shows few-step (particularly 1-step) diffusion models as optimal for latent-space unlearnable sample generation. We provide the trends on $e_R$ and $e_L$ at different $k$ values in the plot (Right). Curve-Fit here is used to assess if distribution integrity is maintained.
  • Figure 3: $\rho$ is trained in the presence of $\Phi^{\text{denoise}}$ to get $\bar{z}^{\text{ul}}_T$.
  • Figure 4: Qualitative Results Comparison to Baseline Methods: Three illustrative examples with a maximum budget of $10/255$ compared to baselines. Each example contains (top row) unlearnable samples after Step 1 followed by (bottom left) enlarged random region in the image to demonstrate difference in image perturbation, and (bottom right) Personalization result using TI after $15$ steps of DiffPure purification. We see that existing methods that rely on pixel-level perturbations add strong, visible noise/artifacts to the image without providing identity protection. Best viewed on full-screen with zoomed view for clear difference in perturbation.
  • Figure 5: DiffPure Purification after Personalization: We perform stress tests on the strength of DiffPure attack to demonstrate the resistance of our method to advanced adversarial based attacks. We see that our method is robust in preserving identity over $150$ DiffPure steps of purification. Only at around $250$ DiffPure steps, personalization takes place with the right identity.
  • ...and 7 more figures