Make the Most of Everything: Further Considerations on Disrupting Diffusion-based Customization

Long Tang; Dengpan Ye; Sirun Chen; Xiuwen Shi; Yunna Lv; Ziyi Liu

Make the Most of Everything: Further Considerations on Disrupting Diffusion-based Customization

Long Tang, Dengpan Ye, Sirun Chen, Xiuwen Shi, Yunna Lv, Ziyi Liu

TL;DR

This work addresses the security risks of diffusion-based image customization by introducing DADiff, a two-stage adversarial attack that jointly targets prompts and generated images. It leverages an Adversarial Prompt Vector and targeted disruptions of self- and cross-attention within the diffusion UNet, augmented by a Local Random Timestep Gradient Ensemble to better capture time-based gradient information. Empirical results on CelebA-HQ and VGGFace2 show consistent 10-30% improvements in cross-prompt, keyword mismatch, cross-model, and cross-mechanism anti-customization, indicating strong transferability and robustness across prompts, models, and customization mechanisms. The findings highlight practical risks of diffusion-based personalization and suggest directions for more effective defenses and safer deployment on platforms employing Dreambooth-like customization.

Abstract

The fine-tuning technique for text-to-image diffusion models facilitates image customization but risks privacy breaches and opinion manipulation. Current research focuses on prompt- or image-level adversarial attacks for anti-customization, yet it overlooks the correlation between these two levels and the relationship between internal modules and inputs. This hinders anti-customization performance in practical threat scenarios. We propose Dual Anti-Diffusion (DADiff), a two-stage adversarial attack targeting diffusion customization, which, for the first time, integrates the adversarial prompt-level attack into the generation process of image-level adversarial examples. In stage 1, we generate prompt-level adversarial vectors to guide the subsequent image-level attack. In stage 2, besides conducting the end-to-end attack on the UNet model, we disrupt its self- and cross-attention modules, aiming to break the correlations between image pixels and align the cross-attention results computed using instance prompts and adversarial prompt vectors within the images. Furthermore, we introduce a local random timestep gradient ensemble strategy, which updates adversarial perturbations by integrating random gradients from multiple segmented timesets. Experimental results on various mainstream facial datasets demonstrate 10%-30% improvements in cross-prompt, keyword mismatch, cross-model, and cross-mechanism anti-customization with DADiff compared to existing methods.

Make the Most of Everything: Further Considerations on Disrupting Diffusion-based Customization

TL;DR

Abstract

Make the Most of Everything: Further Considerations on Disrupting Diffusion-based Customization

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (10)