FoolSDEdit: Deceptively Steering Your Edits Towards Targeted Attribute-aware Distribution
Qi Zhou, Dongxia Wang, Tianlin Li, Zhihong Xu, Yang Liu, Kui Ren, Wenhai Wang, Qing Guo
TL;DR
The authors uncover a distribution-level vulnerability in diffusion-guided editing (SDEdit), showing that the data distribution $p_ ext{data}$ can drift toward unintended attributes. They formulate Targeted Attribute Generative Attack (TAGA) to induce a target attribute $\hat{a}$ in the generated distribution by perturbing the guided image, while preserving the input attribute. To realize TAGA, they first establish that naive additive perturbations are insufficient and that natural degradations like exposure and motion blur can effectively shift attributes; this motivates FoolSDEdit, which optimizes an execution strategy via SuperPert, an architecture-search graph blending multiple perturbations. Through bi-level optimization and extensive tests on CelebA-HQ and FFHQ across gender, age, and race attributes, FoolSDEdit achieves a pronounced shift toward targeted attributes with competitive image quality, exposing a practical vulnerability in SDEdit and highlighting the need for defense against distribution-level attacks in diffusion-based editing systems.
Abstract
Guided image synthesis methods, like SDEdit based on the diffusion model, excel at creating realistic images from user inputs such as stroke paintings. However, existing efforts mainly focus on image quality, often overlooking a key point: the diffusion model represents a data distribution, not individual images. This introduces a low but critical chance of generating images that contradict user intentions, raising ethical concerns. For example, a user inputting a stroke painting with female characteristics might, with some probability, get male faces from SDEdit. To expose this potential vulnerability, we aim to build an adversarial attack forcing SDEdit to generate a specific data distribution aligned with a specified attribute (e.g., female), without changing the input's attribute characteristics. We propose the Targeted Attribute Generative Attack (TAGA), using an attribute-aware objective function and optimizing the adversarial noise added to the input stroke painting. Empirical studies reveal that traditional adversarial noise struggles with TAGA, while natural perturbations like exposure and motion blur easily alter generated images' attributes. To execute effective attacks, we introduce FoolSDEdit: We design a joint adversarial exposure and blur attack, adding exposure and motion blur to the stroke painting and optimizing them together. We optimize the execution strategy of various perturbations, framing it as a network architecture search problem. We create the SuperPert, a graph representing diverse execution strategies for different perturbations. After training, we obtain the optimized execution strategy for effective TAGA against SDEdit. Comprehensive experiments on two datasets show our method compelling SDEdit to generate a targeted attribute-aware data distribution, significantly outperforming baselines.
