PureGen: Universal Data Purification for Train-Time Poison Defense via Generative Model Dynamics
Sunay Bhat, Jeffrey Jiang, Omead Pooladzandi, Alexander Branch, Gregory Pottie
TL;DR
PureGen introduces universal data purification for train-time poisoning by applying stochastic transforms through Energy-Based Models and Denoising Diffusion Probabilistic Models. By performing mid-run Langevin and diffusion-based purification, poisoned samples are guided back toward the natural image manifold with minimal loss to natural accuracy, yielding state-of-the-art defense across multiple attacks and datasets. The approach remains effective under distributional shifts in generative-model training data and can be extended via combinations to handle higher-power poisons, making it a practical preprocessing defense with broad applicability. Overall, PureGen provides a robust, attack-agnostic defense that reduces the computational and architectural burden of traditional defenses while enhancing model reliability in security-sensitive deployments.
Abstract
Train-time data poisoning attacks threaten machine learning models by introducing adversarial examples during training, leading to misclassification. Current defense methods often reduce generalization performance, are attack-specific, and impose significant training overhead. To address this, we introduce a set of universal data purification methods using a stochastic transform, $Ψ(x)$, realized via iterative Langevin dynamics of Energy-Based Models (EBMs), Denoising Diffusion Probabilistic Models (DDPMs), or both. These approaches purify poisoned data with minimal impact on classifier generalization. Our specially trained EBMs and DDPMs provide state-of-the-art defense against various attacks (including Narcissus, Bullseye Polytope, Gradient Matching) on CIFAR-10, Tiny-ImageNet, and CINIC-10, without needing attack or classifier-specific information. We discuss performance trade-offs and show that our methods remain highly effective even with poisoned or distributionally shifted generative model training data.
