Table of Contents
Fetching ...

Fragile by Design: On the Limits of Adversarial Defenses in Personalized Generation

Zhen Chen, Yi Zhang, Xiangyu Yin, Chengxuan Qin, Xingyu Zhao, Xiaowei Huang, Wenjie Ruan

TL;DR

This work examines the privacy risks of personalized generation in diffusion-based models (e.g., DreamBooth) and critiques existing anti-personalization defenses that inject perturbations into user images. It introduces AntiDB_Purify, a framework to evaluate defenses under realistic purification threats, including both traditional image filters and adversarial purification. Across multiple defenses (Anti-DreamBooth, HF-ADB, SimAC, DisDiff), the study shows that purification—whether simple filtering or diffusion-based purification—destroys protection, enabling DreamBooth to memorize and reproduce target identities. The findings highlight a false sense of security in current defenses and call for more imperceptible, robust protections to safeguard user identity in personalized generation systems, with HF-ADB offering a possible direction for future work.

Abstract

Personalized AI applications such as DreamBooth enable the generation of customized content from user images, but also raise significant privacy concerns, particularly the risk of facial identity leakage. Recent defense mechanisms like Anti-DreamBooth attempt to mitigate this risk by injecting adversarial perturbations into user photos to prevent successful personalization. However, we identify two critical yet overlooked limitations of these methods. First, the adversarial examples often exhibit perceptible artifacts such as conspicuous patterns or stripes, making them easily detectable as manipulated content. Second, the perturbations are highly fragile, as even a simple, non-learned filter can effectively remove them, thereby restoring the model's ability to memorize and reproduce user identity. To investigate this vulnerability, we propose a novel evaluation framework, AntiDB_Purify, to systematically evaluate existing defenses under realistic purification threats, including both traditional image filters and adversarial purification. Results reveal that none of the current methods maintains their protective effectiveness under such threats. These findings highlight that current defenses offer a false sense of security and underscore the urgent need for more imperceptible and robust protections to safeguard user identity in personalized generation.

Fragile by Design: On the Limits of Adversarial Defenses in Personalized Generation

TL;DR

This work examines the privacy risks of personalized generation in diffusion-based models (e.g., DreamBooth) and critiques existing anti-personalization defenses that inject perturbations into user images. It introduces AntiDB_Purify, a framework to evaluate defenses under realistic purification threats, including both traditional image filters and adversarial purification. Across multiple defenses (Anti-DreamBooth, HF-ADB, SimAC, DisDiff), the study shows that purification—whether simple filtering or diffusion-based purification—destroys protection, enabling DreamBooth to memorize and reproduce target identities. The findings highlight a false sense of security in current defenses and call for more imperceptible, robust protections to safeguard user identity in personalized generation systems, with HF-ADB offering a possible direction for future work.

Abstract

Personalized AI applications such as DreamBooth enable the generation of customized content from user images, but also raise significant privacy concerns, particularly the risk of facial identity leakage. Recent defense mechanisms like Anti-DreamBooth attempt to mitigate this risk by injecting adversarial perturbations into user photos to prevent successful personalization. However, we identify two critical yet overlooked limitations of these methods. First, the adversarial examples often exhibit perceptible artifacts such as conspicuous patterns or stripes, making them easily detectable as manipulated content. Second, the perturbations are highly fragile, as even a simple, non-learned filter can effectively remove them, thereby restoring the model's ability to memorize and reproduce user identity. To investigate this vulnerability, we propose a novel evaluation framework, AntiDB_Purify, to systematically evaluate existing defenses under realistic purification threats, including both traditional image filters and adversarial purification. Results reveal that none of the current methods maintains their protective effectiveness under such threats. These findings highlight that current defenses offer a false sense of security and underscore the urgent need for more imperceptible and robust protections to safeguard user identity in personalized generation.

Paper Structure

This paper contains 28 sections, 11 equations, 6 figures, 3 tables.

Figures (6)

  • Figure 1: Purification can remove adversarial perturbations, thereby undermining the effectiveness of anti-personalization.
  • Figure 2: Fourier magnitude spectrum of clean images, adversarial examples generated by Anti-Dreambooth, and the corresponding purified images using bilateral filtering followed by guided filtering.
  • Figure 3: Visualization of clean images, heatmap of adversarial examples generated by Anti-Dreambooth, and their purified counterparts using bilateral filtering followed by guided filtering, and DiffPure.
  • Figure 4: Visualization of clean portrait images (first row), adversarial examples generated by each anti-personalized method (second row), DreamBooth output on adversarial examples (middle three rows), and DreamBooth output on purified images using DiffPure (last three rows), text prompts are "a photo of sks person", "a dslr portrait of sks person", and "a photo of sks person looking at the mirror".
  • Figure 5: Visualization of clean portrait images (first row); DreamBooth output on SimAC generated adversarial examples and on their purified images using DiffPure(middle two rows); DreamBooth output on HF-SimAC generated adversarial examples and on their purified images using DiffPure(last two rows). Text prompts are "a photo of sks person", "a dslr portrait of sks person", "a photo of sks person looking at the mirror", and "a photo of sks person in front of the Eiffel Tower".
  • ...and 1 more figures