FoolSDEdit: Deceptively Steering Your Edits Towards Targeted Attribute-aware Distribution

Qi Zhou; Dongxia Wang; Tianlin Li; Zhihong Xu; Yang Liu; Kui Ren; Wenhai Wang; Qing Guo

FoolSDEdit: Deceptively Steering Your Edits Towards Targeted Attribute-aware Distribution

Qi Zhou, Dongxia Wang, Tianlin Li, Zhihong Xu, Yang Liu, Kui Ren, Wenhai Wang, Qing Guo

TL;DR

The authors uncover a distribution-level vulnerability in diffusion-guided editing (SDEdit), showing that the data distribution $p_ ext{data}$ can drift toward unintended attributes. They formulate Targeted Attribute Generative Attack (TAGA) to induce a target attribute $\hat{a}$ in the generated distribution by perturbing the guided image, while preserving the input attribute. To realize TAGA, they first establish that naive additive perturbations are insufficient and that natural degradations like exposure and motion blur can effectively shift attributes; this motivates FoolSDEdit, which optimizes an execution strategy via SuperPert, an architecture-search graph blending multiple perturbations. Through bi-level optimization and extensive tests on CelebA-HQ and FFHQ across gender, age, and race attributes, FoolSDEdit achieves a pronounced shift toward targeted attributes with competitive image quality, exposing a practical vulnerability in SDEdit and highlighting the need for defense against distribution-level attacks in diffusion-based editing systems.

Abstract

Guided image synthesis methods, like SDEdit based on the diffusion model, excel at creating realistic images from user inputs such as stroke paintings. However, existing efforts mainly focus on image quality, often overlooking a key point: the diffusion model represents a data distribution, not individual images. This introduces a low but critical chance of generating images that contradict user intentions, raising ethical concerns. For example, a user inputting a stroke painting with female characteristics might, with some probability, get male faces from SDEdit. To expose this potential vulnerability, we aim to build an adversarial attack forcing SDEdit to generate a specific data distribution aligned with a specified attribute (e.g., female), without changing the input's attribute characteristics. We propose the Targeted Attribute Generative Attack (TAGA), using an attribute-aware objective function and optimizing the adversarial noise added to the input stroke painting. Empirical studies reveal that traditional adversarial noise struggles with TAGA, while natural perturbations like exposure and motion blur easily alter generated images' attributes. To execute effective attacks, we introduce FoolSDEdit: We design a joint adversarial exposure and blur attack, adding exposure and motion blur to the stroke painting and optimizing them together. We optimize the execution strategy of various perturbations, framing it as a network architecture search problem. We create the SuperPert, a graph representing diverse execution strategies for different perturbations. After training, we obtain the optimized execution strategy for effective TAGA against SDEdit. Comprehensive experiments on two datasets show our method compelling SDEdit to generate a targeted attribute-aware data distribution, significantly outperforming baselines.

FoolSDEdit: Deceptively Steering Your Edits Towards Targeted Attribute-aware Distribution

TL;DR

The authors uncover a distribution-level vulnerability in diffusion-guided editing (SDEdit), showing that the data distribution

can drift toward unintended attributes. They formulate Targeted Attribute Generative Attack (TAGA) to induce a target attribute

in the generated distribution by perturbing the guided image, while preserving the input attribute. To realize TAGA, they first establish that naive additive perturbations are insufficient and that natural degradations like exposure and motion blur can effectively shift attributes; this motivates FoolSDEdit, which optimizes an execution strategy via SuperPert, an architecture-search graph blending multiple perturbations. Through bi-level optimization and extensive tests on CelebA-HQ and FFHQ across gender, age, and race attributes, FoolSDEdit achieves a pronounced shift toward targeted attributes with competitive image quality, exposing a practical vulnerability in SDEdit and highlighting the need for defense against distribution-level attacks in diffusion-based editing systems.

Abstract

Paper Structure (17 sections, 9 equations, 6 figures, 3 tables)

This paper contains 17 sections, 9 equations, 6 figures, 3 tables.

Introduction
Related Work
Background and Analysis
Targeted Attribute Generative Attack
Problem Formulation
Naive Implementation
Empirical Study and Motivation
FoolSDEdit
Joint Adv. Exposure and Blur for TAGA
SuperPert
Optimization
Experimental Results
Experimental Setups
Attack on Gender Attribute
Attack on Age and Race Attributes
...and 2 more sections

Figures (6)

Figure 1: Examples of our method and comparative results before and after our attack.
Figure 2: Adding adversarial noise from PGD, random motion blur, and random exposure to the mixing stroke paintings in Fig. \ref{['fig:analysis']} (a).
Figure 3: Pipeline of SuperPert. $\mathbf{X}^\text{g}$ is the input stroke painting and $\mathbf{X}^{\text{g}'}$ is the adversarial perturbed stroke painting.
Figure 4: Top: Female to male transition example, Bottom: Male to female transition example.
Figure 5: Top: Young to senior transition example, Bottom: Senior to young transition example.
...and 1 more figures

FoolSDEdit: Deceptively Steering Your Edits Towards Targeted Attribute-aware Distribution

TL;DR

Abstract

FoolSDEdit: Deceptively Steering Your Edits Towards Targeted Attribute-aware Distribution

Authors

TL;DR

Abstract

Table of Contents

Figures (6)