PureGen: Universal Data Purification for Train-Time Poison Defense via Generative Model Dynamics

Sunay Bhat; Jeffrey Jiang; Omead Pooladzandi; Alexander Branch; Gregory Pottie

PureGen: Universal Data Purification for Train-Time Poison Defense via Generative Model Dynamics

Sunay Bhat, Jeffrey Jiang, Omead Pooladzandi, Alexander Branch, Gregory Pottie

TL;DR

PureGen introduces universal data purification for train-time poisoning by applying stochastic transforms through Energy-Based Models and Denoising Diffusion Probabilistic Models. By performing mid-run Langevin and diffusion-based purification, poisoned samples are guided back toward the natural image manifold with minimal loss to natural accuracy, yielding state-of-the-art defense across multiple attacks and datasets. The approach remains effective under distributional shifts in generative-model training data and can be extended via combinations to handle higher-power poisons, making it a practical preprocessing defense with broad applicability. Overall, PureGen provides a robust, attack-agnostic defense that reduces the computational and architectural burden of traditional defenses while enhancing model reliability in security-sensitive deployments.

Abstract

Train-time data poisoning attacks threaten machine learning models by introducing adversarial examples during training, leading to misclassification. Current defense methods often reduce generalization performance, are attack-specific, and impose significant training overhead. To address this, we introduce a set of universal data purification methods using a stochastic transform, $Ψ(x)$, realized via iterative Langevin dynamics of Energy-Based Models (EBMs), Denoising Diffusion Probabilistic Models (DDPMs), or both. These approaches purify poisoned data with minimal impact on classifier generalization. Our specially trained EBMs and DDPMs provide state-of-the-art defense against various attacks (including Narcissus, Bullseye Polytope, Gradient Matching) on CIFAR-10, Tiny-ImageNet, and CINIC-10, without needing attack or classifier-specific information. We discuss performance trade-offs and show that our methods remain highly effective even with poisoned or distributionally shifted generative model training data.

PureGen: Universal Data Purification for Train-Time Poison Defense via Generative Model Dynamics

TL;DR

Abstract

, realized via iterative Langevin dynamics of Energy-Based Models (EBMs), Denoising Diffusion Probabilistic Models (DDPMs), or both. These approaches purify poisoned data with minimal impact on classifier generalization. Our specially trained EBMs and DDPMs provide state-of-the-art defense against various attacks (including Narcissus, Bullseye Polytope, Gradient Matching) on CIFAR-10, Tiny-ImageNet, and CINIC-10, without needing attack or classifier-specific information. We discuss performance trade-offs and show that our methods remain highly effective even with poisoned or distributionally shifted generative model training data.

Paper Structure (43 sections, 16 equations, 12 figures, 10 tables, 2 algorithms)

This paper contains 43 sections, 16 equations, 12 figures, 10 tables, 2 algorithms.

Introduction
Related Work
Targeted Data Poisoning Attack
Train-Time Poison Defense Strategies
PureGen: Purifying Generative Dynamics against Poisoning Attacks
Energy-Based Models and PureGen-EBM
Diffusion Models and PureGen-DDPM
Classification with Stochastic Transformation
Erasing Poison Signals via Mid-Run MCMC
Experiments
Experimental Details
Benchmark Results
PureGen Robustness to Train Distribution Shift and Poisoning
PureGen Extensions on Higher Power Attacks
PureGen Timing and Limitations
...and 28 more sections

Figures (12)

Figure 1: Top The full PureGen pipeline is shown where we apply our method as a preprocessing step with no further downstream changes to the classifier training or inference. Poisoned images are moderately exaggerated to show visually.Bottom Left Energy distributions of clean, poisoned, and PureGen purified images. Our methods push poisoned images via purification into the natural,clean image energy manifold. Bottom Right The removal of poison artifacts and the similarity of clean and poisoned images after purification using PureGen EBM and DDPM dynamics. The purified dataset results in SoTA defense and high classifier natural accuracy.
Figure 2: Plot of $\ell_2$ distances for PureGen-EBM (Left) and PureGen-DDPM (Right) between clean images and clean purified (blue), clean images and poisoned purified (green), and poisoned images and poisoned purified images (orange) at points on the Langevin dynamics trajectory. Purifying poisoned images for less than 250 steps moves a poisoned image closer to its clean image with a minimum around 150, preserving the natural image while removing the adversarial features.
Figure 3: PureGen-EBM vs. PureGen-DDPM with increasingly Out-of-Distribution training data (for generative model training) and purifying target/attacked distribution CIFAR-10. PureGen-EBM is much more robust to distributional shift for natural accuracy while both PureGen-EBM and PureGen-DDPM maintain SoTA poison defense across all train distributions *CIFAR-10 is a "cheating" baseline as clean versions of poisoned images are present in training data.
Figure 4: Top We compare PureGen-DDPM forward steps with the standard DDPM where 250 steps degrades images for purification but does not reach a noise prior. Note that all model are trained with the same linear $\beta$ schedule. Bottom Left Generated images from models with 250, 750, and 1000 (Standard) train forward steps where it is clear 250 steps does not generate realistic imagesBottom Right Significantly improved poison defense performance of PureGen-DDPM with 250 train steps indicating a trade-off between data purification and generative capabilities.
Figure 5: PureGen-EBM purification with various MCMC steps
...and 7 more figures

PureGen: Universal Data Purification for Train-Time Poison Defense via Generative Model Dynamics

TL;DR

Abstract

PureGen: Universal Data Purification for Train-Time Poison Defense via Generative Model Dynamics

Authors

TL;DR

Abstract

Table of Contents

Figures (12)