Invisible Backdoor Attacks on Diffusion Models

Sen Li; Junchi Ma; Minhao Cheng

Invisible Backdoor Attacks on Diffusion Models

Sen Li, Junchi Ma, Minhao Cheng

TL;DR

This work introduces a versatile bi-level optimization framework to inject invisible image triggers into both unconditional and conditional diffusion models, enabling targeted outputs when triggers are present while preserving normal performance otherwise. The trigger generator learns input-aware perturbations and can produce universal or distribution-based invisibile triggers, applicable to text-guided editing and image inpainting, and extended to model watermarking for ownership verification. Empirically, the approach achieves high utility (FID comparable to clean models) and high specificity (low MSE for target outputs) across CIFAR10, CELEBA-HQ, and MS COCO, using DDIM (and other) samplers, with strong stealth against countermeasures like ANP, clipping, and Elijah. The results underscore significant security implications for diffusion-model deployments and motivate development of defense methods and faster training strategies for practical use.

Abstract

In recent years, diffusion models have achieved remarkable success in the realm of high-quality image generation, garnering increased attention. This surge in interest is paralleled by a growing concern over the security threats associated with diffusion models, largely attributed to their susceptibility to malicious exploitation. Notably, recent research has brought to light the vulnerability of diffusion models to backdoor attacks, enabling the generation of specific target images through corresponding triggers. However, prevailing backdoor attack methods rely on manually crafted trigger generation functions, often manifesting as discernible patterns incorporated into input noise, thus rendering them susceptible to human detection. In this paper, we present an innovative and versatile optimization framework designed to acquire invisible triggers, enhancing the stealthiness and resilience of inserted backdoors. Our proposed framework is applicable to both unconditional and conditional diffusion models, and notably, we are the pioneers in demonstrating the backdooring of diffusion models within the context of text-guided image editing and inpainting pipelines. Moreover, we also show that the backdoors in the conditional generation can be directly applied to model watermarking for model ownership verification, which further boosts the significance of the proposed framework. Extensive experiments on various commonly used samplers and datasets verify the efficacy and stealthiness of the proposed framework. Our code is publicly available at https://github.com/invisibleTriggerDiffusion/invisible_triggers_for_diffusion.

Invisible Backdoor Attacks on Diffusion Models

TL;DR

Abstract

Paper Structure (33 sections, 1 theorem, 19 equations, 15 figures, 11 tables, 2 algorithms)

This paper contains 33 sections, 1 theorem, 19 equations, 15 figures, 11 tables, 2 algorithms.

Introduction
Related work
Diffusion models
Backdoor attacks on diffusion models
Methodology
Threat Model
Optimization framework for learnable invisible trigger
Backdooring unconditional diffusion models
Backdooring conditional diffusion models
Using invisible trigger as model watermarking
Experiments
Implementation details
Unconditional generation results
Universal backdoor triggers
Distribution based trigger results
...and 18 more sections

Key Result

Lemma 1

Let $q_\sigma (\bm{x}'_{1:T}|\bm{x}'_0)$ and $q_\sigma (\bm{x}'_{t-1}|\bm{x}'_t, \bm{x}'_0)$ be defined by Equation eq:ddim_backdoor, we have

Figures (15)

Figure 1: Illustration of our proposed invisible triggers, and visible triggers used in chou2023backdoorchou2023villandiffusionchen2023trojdiff.
Figure 2: Examples of learnable invisible universal trigger on CIFAR10.
Figure 3: Examples of universal trigger on high-resolution dataset, CELEBA-HQ.
Figure 4: Examples of learnable trigger distribution on CIFAR10.
Figure 5: Examples of invisible input-aware trigger in conditional diffusion models. Given the masked image, the conditional diffusion will perform normally and edit the masked image following text description if there is no trigger inside. However, if the invisible trigger is inserted into the masked image, the model will output the target image regardless of any given text.
...and 10 more figures

Theorems & Definitions (2)

Lemma 1
proof

Invisible Backdoor Attacks on Diffusion Models

TL;DR

Abstract

Invisible Backdoor Attacks on Diffusion Models

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (15)

Theorems & Definitions (2)