Table of Contents
Fetching ...

Make the Most of Everything: Further Considerations on Disrupting Diffusion-based Customization

Long Tang, Dengpan Ye, Sirun Chen, Xiuwen Shi, Yunna Lv, Ziyi Liu

TL;DR

This work addresses the security risks of diffusion-based image customization by introducing DADiff, a two-stage adversarial attack that jointly targets prompts and generated images. It leverages an Adversarial Prompt Vector and targeted disruptions of self- and cross-attention within the diffusion UNet, augmented by a Local Random Timestep Gradient Ensemble to better capture time-based gradient information. Empirical results on CelebA-HQ and VGGFace2 show consistent 10-30% improvements in cross-prompt, keyword mismatch, cross-model, and cross-mechanism anti-customization, indicating strong transferability and robustness across prompts, models, and customization mechanisms. The findings highlight practical risks of diffusion-based personalization and suggest directions for more effective defenses and safer deployment on platforms employing Dreambooth-like customization.

Abstract

The fine-tuning technique for text-to-image diffusion models facilitates image customization but risks privacy breaches and opinion manipulation. Current research focuses on prompt- or image-level adversarial attacks for anti-customization, yet it overlooks the correlation between these two levels and the relationship between internal modules and inputs. This hinders anti-customization performance in practical threat scenarios. We propose Dual Anti-Diffusion (DADiff), a two-stage adversarial attack targeting diffusion customization, which, for the first time, integrates the adversarial prompt-level attack into the generation process of image-level adversarial examples. In stage 1, we generate prompt-level adversarial vectors to guide the subsequent image-level attack. In stage 2, besides conducting the end-to-end attack on the UNet model, we disrupt its self- and cross-attention modules, aiming to break the correlations between image pixels and align the cross-attention results computed using instance prompts and adversarial prompt vectors within the images. Furthermore, we introduce a local random timestep gradient ensemble strategy, which updates adversarial perturbations by integrating random gradients from multiple segmented timesets. Experimental results on various mainstream facial datasets demonstrate 10%-30% improvements in cross-prompt, keyword mismatch, cross-model, and cross-mechanism anti-customization with DADiff compared to existing methods.

Make the Most of Everything: Further Considerations on Disrupting Diffusion-based Customization

TL;DR

This work addresses the security risks of diffusion-based image customization by introducing DADiff, a two-stage adversarial attack that jointly targets prompts and generated images. It leverages an Adversarial Prompt Vector and targeted disruptions of self- and cross-attention within the diffusion UNet, augmented by a Local Random Timestep Gradient Ensemble to better capture time-based gradient information. Empirical results on CelebA-HQ and VGGFace2 show consistent 10-30% improvements in cross-prompt, keyword mismatch, cross-model, and cross-mechanism anti-customization, indicating strong transferability and robustness across prompts, models, and customization mechanisms. The findings highlight practical risks of diffusion-based personalization and suggest directions for more effective defenses and safer deployment on platforms employing Dreambooth-like customization.

Abstract

The fine-tuning technique for text-to-image diffusion models facilitates image customization but risks privacy breaches and opinion manipulation. Current research focuses on prompt- or image-level adversarial attacks for anti-customization, yet it overlooks the correlation between these two levels and the relationship between internal modules and inputs. This hinders anti-customization performance in practical threat scenarios. We propose Dual Anti-Diffusion (DADiff), a two-stage adversarial attack targeting diffusion customization, which, for the first time, integrates the adversarial prompt-level attack into the generation process of image-level adversarial examples. In stage 1, we generate prompt-level adversarial vectors to guide the subsequent image-level attack. In stage 2, besides conducting the end-to-end attack on the UNet model, we disrupt its self- and cross-attention modules, aiming to break the correlations between image pixels and align the cross-attention results computed using instance prompts and adversarial prompt vectors within the images. Furthermore, we introduce a local random timestep gradient ensemble strategy, which updates adversarial perturbations by integrating random gradients from multiple segmented timesets. Experimental results on various mainstream facial datasets demonstrate 10%-30% improvements in cross-prompt, keyword mismatch, cross-model, and cross-mechanism anti-customization with DADiff compared to existing methods.

Paper Structure

This paper contains 35 sections, 12 equations, 10 figures, 8 tables, 1 algorithm.

Figures (10)

  • Figure 1: Comparison of vanilla Dreambooth customization, SOTA anti-customization methods, and proposed DADiff. Inference prompts in the grey background are unknown prompts for anti-customization. DADiff achieves a more thorough disruption, making generated images hard to recognize, and has better transferability across different black-box inference prompts.
  • Figure 2: The pipeline of DADiff, where only the flame icon is updated at each step. We firstly execute stage (a) to obtain the Adversarial Prompt Vector, and then use APV and instance prompt to generate image-level adversarial examples in stage (b).
  • Figure 3: Generated images guided by original prompts (column 2) and APVs (column 3-6) starting from the prompt "A photo of a woman." Row 1: images from random noise. Row 2: images from DDIM-inverted initial image. APVs are created after 10, 50, 100, and 500 iterations.
  • Figure 4: Comparisons of losses (first row) and gradient scores (second row) when performing PGD attack using single timestep (blue lines) and LRTGE (red lines). The total PGD iteration is 300 rounds (50 iterations ASPL van2023anti with 6 iterations inside).
  • Figure 5: Attention heatmaps of generated images from vanilla Dreambooth training and different anti-customization methods.
  • ...and 5 more figures