Prompt-Agnostic Adversarial Perturbation for Customized Diffusion Models

Cong Wan; Yuhang He; Xiang Song; Yihong Gong

Prompt-Agnostic Adversarial Perturbation for Customized Diffusion Models

Cong Wan, Yuhang He, Xiang Song, Yihong Gong

TL;DR

A Prompt-Agnostic Adversarial Perturbation (PAP) method for customized diffusion models that effectively tackles the prompt-agnostic attacks, leading to improved defense stability.

Abstract

Diffusion models have revolutionized customized text-to-image generation, allowing for efficient synthesis of photos from personal data with textual descriptions. However, these advancements bring forth risks including privacy breaches and unauthorized replication of artworks. Previous researches primarily center around using prompt-specific methods to generate adversarial examples to protect personal images, yet the effectiveness of existing methods is hindered by constrained adaptability to different prompts. In this paper, we introduce a Prompt-Agnostic Adversarial Perturbation (PAP) method for customized diffusion models. PAP first models the prompt distribution using a Laplace Approximation, and then produces prompt-agnostic perturbations by maximizing a disturbance expectation based on the modeled distribution. This approach effectively tackles the prompt-agnostic attacks, leading to improved defense stability. Extensive experiments in face privacy and artistic style protection, demonstrate the superior generalization of PAP in comparison to existing techniques. Our project page is available at https://github.com/vancyland/Prompt-Agnostic-Adversarial-Perturbation-for-Customized-Diffusion-Models.github.io.

Prompt-Agnostic Adversarial Perturbation for Customized Diffusion Models

TL;DR

A Prompt-Agnostic Adversarial Perturbation (PAP) method for customized diffusion models that effectively tackles the prompt-agnostic attacks, leading to improved defense stability.

Abstract

Paper Structure (52 sections, 2 theorems, 47 equations, 15 figures, 16 tables, 3 algorithms)

This paper contains 52 sections, 2 theorems, 47 equations, 15 figures, 16 tables, 3 algorithms.

Introduction
Related Work
Prompt-Agnostic Adversarial Perturbation
Background and Motivation
PAP: Prompt-Agnostic Perturbation by Prompt Distribution Modeling
Modeling the Prompt Distribution $Q_{(x_0,c_0)}$
Laplace Modeling
Parameter Estimators
Maximizing the disturbance expectation
Experiments
Experimental setup
Comparison with State-of-the-Art Methods
Face Privacy Protection
Style Imitation
Ablation Study
...and 37 more sections

Key Result

Theorem A.1

Assume $g:\mathbb{R}^{m \times m} \to \mathbb{R}$ is Lipschitz continuous under $L_1$ norm. Then, as $n \to \infty$, we have where $x_i \overset{\text{i.i.d.}}{\sim} \mathcal{N}(0, I)$ for $i=1:n$.

Figures (15)

Figure 1: Illustration of a portrait with (a) no defense, (b) prompt-specific and (c) our PAP prompt-agnostic perturbation. In (a), the portrait is easily tampered with by the diffusion model. In (b), the prompt-specific methods only perform well on learned prompts (i.e., Prompt A) and are fruitless to unseen prompts (i.e., Prompts B and C). In (c), the proposed PAP is robust to both the seen and unseen prompts, and successfully protects the portrait from diffusion model tampering
Figure 2: Qualitative defense results of different methods in VGGFace2 (left) and Wikiart (right). Each row represents a method, and each column represents a different test prompt (shown at the bottom). The adversarial examples generated by our method effectively defend against all prompts in both datasets. In contrast, other baselines primarily focus on protecting the fixed prompt (the first column), resulting in compromised defense for other prompts.
Figure 3: Defense performance of different methods in prompt variation settings. The x-axis represents the number of prompt categories multiplied by the number of generated images per prompt: 4$\times$20, 8$\times$10, 10$\times$8, 16$\times$5, and 20$\times$4. The y-axis displays the values of different metrics.
Figure 4: Cosine dissimilarity between $\hat{H}^{-1}$ and $H^{-1}$ under different settings of $D$ and $l$.
Figure 5: Visualized results of different test prompts toward Anti-DB method on the CelebA-HQ dataset and Wikiart dataset, with training prompt: a photo of sks person (top) / a sks painting (bottom). The left column are adversarial examples (denoted as AE) by Anti-DB.
...and 10 more figures

Theorems & Definitions (5)

Definition 3.1
Theorem A.1
proof
Corollary A.2
Definition A.3

Prompt-Agnostic Adversarial Perturbation for Customized Diffusion Models

TL;DR

Abstract

Prompt-Agnostic Adversarial Perturbation for Customized Diffusion Models

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (15)

Theorems & Definitions (5)