Table of Contents
Fetching ...

REDEditing: Relationship-Driven Precise Backdoor Poisoning on Text-to-Image Diffusion Models

Chongye Guo, Jinhu Fu, Junfeng Fang, Kun Wang, Guorui Feng

TL;DR

The paper tackles security vulnerabilities in text-to-image diffusion models arising from training-free backdoor poisoning via model editing. It introduces REDEditing, a relation-driven poisoning framework that uses equivalent-attribute alignment and a stealthy poisoning constraint to precisely bind triggers to toxic concepts while preserving benign generation. The approach utilizes equivalent relationship retrieval, joint-attribute transfer, and a knowledge isolation constraint to achieve high attack efficacy and stealth, outperforming prior methods by significant margins and requiring minimal additional edits. Empirical results across multiple Stable Diffusion versions demonstrate strong performance and practical risks, underscoring the need for defense mechanisms against model-editing based backdoors in editable image generation systems.

Abstract

The rapid advancement of generative AI highlights the importance of text-to-image (T2I) security, particularly with the threat of backdoor poisoning. Timely disclosure and mitigation of security vulnerabilities in T2I models are crucial for ensuring the safe deployment of generative models. We explore a novel training-free backdoor poisoning paradigm through model editing, which is recently employed for knowledge updating in large language models. Nevertheless, we reveal the potential security risks posed by model editing techniques to image generation models. In this work, we establish the principles for backdoor attacks based on model editing, and propose a relationship-driven precise backdoor poisoning method, REDEditing. Drawing on the principles of equivalent-attribute alignment and stealthy poisoning, we develop an equivalent relationship retrieval and joint-attribute transfer approach that ensures consistent backdoor image generation through concept rebinding. A knowledge isolation constraint is proposed to preserve benign generation integrity. Our method achieves an 11\% higher attack success rate compared to state-of-the-art approaches. Remarkably, adding just one line of code enhances output naturalness while improving backdoor stealthiness by 24\%. This work aims to heighten awareness regarding this security vulnerability in editable image generation models.

REDEditing: Relationship-Driven Precise Backdoor Poisoning on Text-to-Image Diffusion Models

TL;DR

The paper tackles security vulnerabilities in text-to-image diffusion models arising from training-free backdoor poisoning via model editing. It introduces REDEditing, a relation-driven poisoning framework that uses equivalent-attribute alignment and a stealthy poisoning constraint to precisely bind triggers to toxic concepts while preserving benign generation. The approach utilizes equivalent relationship retrieval, joint-attribute transfer, and a knowledge isolation constraint to achieve high attack efficacy and stealth, outperforming prior methods by significant margins and requiring minimal additional edits. Empirical results across multiple Stable Diffusion versions demonstrate strong performance and practical risks, underscoring the need for defense mechanisms against model-editing based backdoors in editable image generation systems.

Abstract

The rapid advancement of generative AI highlights the importance of text-to-image (T2I) security, particularly with the threat of backdoor poisoning. Timely disclosure and mitigation of security vulnerabilities in T2I models are crucial for ensuring the safe deployment of generative models. We explore a novel training-free backdoor poisoning paradigm through model editing, which is recently employed for knowledge updating in large language models. Nevertheless, we reveal the potential security risks posed by model editing techniques to image generation models. In this work, we establish the principles for backdoor attacks based on model editing, and propose a relationship-driven precise backdoor poisoning method, REDEditing. Drawing on the principles of equivalent-attribute alignment and stealthy poisoning, we develop an equivalent relationship retrieval and joint-attribute transfer approach that ensures consistent backdoor image generation through concept rebinding. A knowledge isolation constraint is proposed to preserve benign generation integrity. Our method achieves an 11\% higher attack success rate compared to state-of-the-art approaches. Remarkably, adding just one line of code enhances output naturalness while improving backdoor stealthiness by 24\%. This work aims to heighten awareness regarding this security vulnerability in editable image generation models.

Paper Structure

This paper contains 19 sections, 14 equations, 6 figures, 4 tables.

Figures (6)

  • Figure 1: Difference between EvilEdit EvilEdit and our REDEditing. In the case of using ‘cat’ as the trigger to insert the ‘zebra’ concept, the prompt is “a man feeds a cat.” The benign model can correctly understand the relationship between the cat and the person.
  • Figure 2: Overview of our backdoor attack method REDEditing. (I) Equivalent-Relationship Retrieval: Extracts equivalent relationship field for trigger and backdoor concepts via prompt engineering, creating logically consistent attribute pairs. (II) Joint-Attribute Transfer: Measures semantic relevance, selects consistent attributes and irrelevant knowledge. (III) Precise Backdoor Poisoning: Injecting toxic concepts into cross-attention weights via joint editing while keeping stealthy.
  • Figure 3: Visualization of backdoor attack performance on SD$v1.5$. The first row is the images generated by the benign model, and the second row shows the images from EvilEdit EvilEdit. The red boxes highlight the unreasonable visual areas. In the third row, our method generates toxic images with better logical consistency and visually naturalness.
  • Figure 4: Comparison of generated images by the origin benign model and the backdoored model under benign prompts. The first row is the benign images generated by the benign model, the second row shows the benign images from backdoored model attacked by EvilEdit EvilEdit. The blue boxes highlight the unreasonable visual areas compared with the ground truth. The third row shows the results from model attacked by REDEditing.
  • Figure 5: (a) Visualization of the perturbation performance about benign output and backdoor output before and after the attack of REDEditing. Note that numerous dots overlap. Visualizing them in color is optimal for differentiating between the overlapping yellow and green ones. (b) Comparison of backdoor attack metrics between REDEditing and SOTA methods.
  • ...and 1 more figures