Table of Contents
Fetching ...

UIBDiffusion: Universal Imperceptible Backdoor Attack for Diffusion Models

Yuning Han, Bingyin Zhao, Rui Chu, Feng Luo, Biplab Sikdar, Yingjie Lao

TL;DR

UIBDiffusion introduces a universal imperceptible backdoor for diffusion models by adapting universal adversarial perturbations (UAPs) into a trainable trigger generator guided by a classifier. The method yields image-agnostic, model-agnostic triggers that preserve diffusion-generation quality while achieving near-perfect attack success at low poison rates, and it remains robust against state-of-the-art trigger-inversion defenses like Elijah and TERD. Empirical results across multiple diffusion models and samplers on CIFAR-10 and CelebA-HQ-256 demonstrate universality, high efficacy, and strong imperceptibility. This work uncovers a substantial security risk in diffusion-model deployment and underscores the need for defenses targeting imperceptible, universal backdoors.

Abstract

Recent studies show that diffusion models (DMs) are vulnerable to backdoor attacks. Existing backdoor attacks impose unconcealed triggers (e.g., a gray box and eyeglasses) that contain evident patterns, rendering remarkable attack effects yet easy detection upon human inspection and defensive algorithms. While it is possible to improve stealthiness by reducing the strength of the backdoor, doing so can significantly compromise its generality and effectiveness. In this paper, we propose UIBDiffusion, the universal imperceptible backdoor attack for diffusion models, which allows us to achieve superior attack and generation performance while evading state-of-the-art defenses. We propose a novel trigger generation approach based on universal adversarial perturbations (UAPs) and reveal that such perturbations, which are initially devised for fooling pre-trained discriminative models, can be adapted as potent imperceptible backdoor triggers for DMs. We evaluate UIBDiffusion on multiple types of DMs with different kinds of samplers across various datasets and targets. Experimental results demonstrate that UIBDiffusion brings three advantages: 1) Universality, the imperceptible trigger is universal (i.e., image and model agnostic) where a single trigger is effective to any images and all diffusion models with different samplers; 2) Utility, it achieves comparable generation quality (e.g., FID) and even better attack success rate (i.e., ASR) at low poison rates compared to the prior works; and 3) Undetectability, UIBDiffusion is plausible to human perception and can bypass Elijah and TERD, the SOTA defenses against backdoors for DMs. We will release our backdoor triggers and code.

UIBDiffusion: Universal Imperceptible Backdoor Attack for Diffusion Models

TL;DR

UIBDiffusion introduces a universal imperceptible backdoor for diffusion models by adapting universal adversarial perturbations (UAPs) into a trainable trigger generator guided by a classifier. The method yields image-agnostic, model-agnostic triggers that preserve diffusion-generation quality while achieving near-perfect attack success at low poison rates, and it remains robust against state-of-the-art trigger-inversion defenses like Elijah and TERD. Empirical results across multiple diffusion models and samplers on CIFAR-10 and CelebA-HQ-256 demonstrate universality, high efficacy, and strong imperceptibility. This work uncovers a substantial security risk in diffusion-model deployment and underscores the need for defenses targeting imperceptible, universal backdoors.

Abstract

Recent studies show that diffusion models (DMs) are vulnerable to backdoor attacks. Existing backdoor attacks impose unconcealed triggers (e.g., a gray box and eyeglasses) that contain evident patterns, rendering remarkable attack effects yet easy detection upon human inspection and defensive algorithms. While it is possible to improve stealthiness by reducing the strength of the backdoor, doing so can significantly compromise its generality and effectiveness. In this paper, we propose UIBDiffusion, the universal imperceptible backdoor attack for diffusion models, which allows us to achieve superior attack and generation performance while evading state-of-the-art defenses. We propose a novel trigger generation approach based on universal adversarial perturbations (UAPs) and reveal that such perturbations, which are initially devised for fooling pre-trained discriminative models, can be adapted as potent imperceptible backdoor triggers for DMs. We evaluate UIBDiffusion on multiple types of DMs with different kinds of samplers across various datasets and targets. Experimental results demonstrate that UIBDiffusion brings three advantages: 1) Universality, the imperceptible trigger is universal (i.e., image and model agnostic) where a single trigger is effective to any images and all diffusion models with different samplers; 2) Utility, it achieves comparable generation quality (e.g., FID) and even better attack success rate (i.e., ASR) at low poison rates compared to the prior works; and 3) Undetectability, UIBDiffusion is plausible to human perception and can bypass Elijah and TERD, the SOTA defenses against backdoors for DMs. We will release our backdoor triggers and code.

Paper Structure

This paper contains 28 sections, 18 equations, 19 figures, 9 tables, 2 algorithms.

Figures (19)

  • Figure 1: Illustrations of the forward diffusion process (top block with the yellow background) and backward diffusion process (bottom block with the blue background) of a clean diffusion model (blue dash line), VillanDiffusion chou2024villandiffusion (red dash line) and UIBDiffusion (Ours, green dash line). The UIBDiffusion trigger ($\uptau$) is plausible to humans in all phases from data poisoning to forward diffusion and backward diffusion while the glasses trigger ($\mathbf{g}$) in prior works is perceptible. UIBDiffusion trigger is highly effective since it introduces a similar distribution shift as the glasses trigger, which secures the attack performance during the backward diffusion process. However, it is hard to detect as the trigger does not possess specific patterns and cannot be inverted by existing defensive algorithms. We empirically verify the effectiveness in Section \ref{['sec:exp']}.
  • Figure 2: Illustration of trigger mechanisms. UIBDiffusion triggers introduce a similar distribution shift as triggers in prior works, which secure a high attack performance. However, since our trigger do not have a specific pattern, they are difficult to remodel by trigger inversion algorithms.
  • Figure 3: FID and MSE comparison of BadDiffusion, VillanDiffusion and UIBDiffusion over different poison rates.
  • Figure 4: UIBDiffusion performance against eleven different samplers across various poison rates. UIBDiffusion yields consistent high ASR, SSIM and MSE and even better FID on ODE-based samplers compared to SDE-based samplers.
  • Figure 5: Performance comparison of BadDiffusion, VillanDiffusion and our UIBDiffusion before and after the Elijah defense. ASR: the higher the better; FID: the lower the better: MSE: the lower the better; SSIM: the higher the better.
  • ...and 14 more figures