Generating Potent Poisons and Backdoors from Scratch with Guided Diffusion

Hossein Souri; Arpit Bansal; Hamid Kazemi; Liam Fowl; Aniruddha Saha; Jonas Geiping; Andrew Gordon Wilson; Rama Chellappa; Tom Goldstein; Micah Goldblum

Generating Potent Poisons and Backdoors from Scratch with Guided Diffusion

Hossein Souri, Arpit Bansal, Hamid Kazemi, Liam Fowl, Aniruddha Saha, Jonas Geiping, Andrew Gordon Wilson, Rama Chellappa, Tom Goldstein, Micah Goldblum

TL;DR

This work addresses the security risk of data poisoning and backdoor attacks stemming from web-scraped datasets. It introduces Guided Diffusion Poisoning (GDP), which synthesizes clean-label base samples from scratch using guided diffusion to maximize downstream attack potency while preserving natural appearance. GDP significantly boosts targeted poisoning and backdoor success across CIFAR-10 and ImageNet, with small poison budgets, and remains effective against several defenses and in black-box settings. The findings highlight the need for robust data curation and defense strategies, given that base-sample design can drastically improve attack efficacy and transferability.

Abstract

Modern neural networks are often trained on massive datasets that are web scraped with minimal human inspection. As a result of this insecure curation pipeline, an adversary can poison or backdoor the resulting model by uploading malicious data to the internet and waiting for a victim to scrape and train on it. Existing approaches for creating poisons and backdoors start with randomly sampled clean data, called base samples, and then modify those samples to craft poisons. However, some base samples may be significantly more amenable to poisoning than others. As a result, we may be able to craft more potent poisons by carefully choosing the base samples. In this work, we use guided diffusion to synthesize base samples from scratch that lead to significantly more potent poisons and backdoors than previous state-of-the-art attacks. Our Guided Diffusion Poisoning (GDP) base samples can be combined with any downstream poisoning or backdoor attack to boost its effectiveness. Our implementation code is publicly available at: https://github.com/hsouri/GDP .

Generating Potent Poisons and Backdoors from Scratch with Guided Diffusion

TL;DR

Abstract

Paper Structure (26 sections, 10 equations, 15 figures, 15 tables, 1 algorithm)

This paper contains 26 sections, 10 equations, 15 figures, 15 tables, 1 algorithm.

Introduction
Related Work
Data Poisoning and Backdoor Attacks
Guidance in Diffusion Models
Background
Poisoning Setup
Universal Guidance
Method
Threat Model
Attack Workflow
Experimental Evaluations
Potent Poisons, Even in Small Quantities
Not Only Potent, but Also Stealthy
Not Only Potent, but Also Transferable
Defenses and Mitigation Strategies
...and 11 more sections

Figures (15)

Figure 1: Schematic of Guided Diffusion Poisoning (GDP). GDP contains three stages: (1) generate base samples with a diffusion model weakly guided using a poisoning loss; (2) use the base samples as initialization for a downstream poisoning algorithm; (3) select poisons with the lowest poisoning loss and include them in the poisoned training set.
Figure 2: GDP base samples are clean-label and high quality (ImageNet). In each panel, the leftmost column contains a random sample from the poison class, the second column contains the target image, and the subsequent three columns contain GDP base samples. Experiments conducted using the Witches' Brew gradient-matching objective with a ResNet-18 model on ImageNet over randomly sampled poison class and target image pairs. Additional visualizations are found in \ref{['app:vis']}.
Figure 3: GDP base samples are clean-label and high quality (CIFAR-10). In each panel, the leftmost column contains a random sample from the poison class, the second column contains the target image, and the subsequent three columns contain GDP base samples. Experiments conducted using the Witches' Brew gradient-matching objective with a ResNet-18 model on CIFAR-10 over randomly sampled poison class and target image pairs. Additional visualizations are found in \ref{['app:vis']}.
Figure 4: GDP produces base samples that look like the target image while still remaining in the poison class. We generate base samples from different poison classes while the target image is fixed. We see that all resulting GDP base samples contain similar colors to the target image but remain clean-label. Experiments conducted on the CIFAR-10 dataset using the Witches' Brew poisoning objective along with a ResNet-18 model.
Figure 5: Visualizations of the triggered test images from the ImageNet dataset.
...and 10 more figures

Generating Potent Poisons and Backdoors from Scratch with Guided Diffusion

TL;DR

Abstract

Generating Potent Poisons and Backdoors from Scratch with Guided Diffusion

Authors

TL;DR

Abstract

Table of Contents

Figures (15)