Table of Contents
Fetching ...

PureEBM: Universal Poison Purification via Mid-Run Dynamics of Energy-Based Models

Omead Pooladzandi, Jeffrey Jiang, Sunay Bhat, Gregory Pottie

TL;DR

Data poisoning threatens model reliability through imperceptible backdoors or triggerless perturbations. PureEBM introduces a universal purification preprocessor that uses mid-run Langevin dynamics of a convergent Energy-Based Model to move poisoned inputs toward the natural-data energy basin via the stochastic transform $Psi_T(x)$. The method achieves state-of-the-art defense across multiple attack types (BP, GM, NS, Narcissus) and remains effective even when the EBM is trained on poisoned or POOD data, with minimal impact on natural accuracy. Its preprocessor design, model- and dataset-agnostic applicability, and favorable compute-accuracy trade-offs enable practical deployment across diverse architectures and settings.

Abstract

Data poisoning attacks pose a significant threat to the integrity of machine learning models by leading to misclassification of target distribution data by injecting adversarial examples during training. Existing state-of-the-art (SoTA) defense methods suffer from limitations, such as significantly reduced generalization performance and significant overhead during training, making them impractical or limited for real-world applications. In response to this challenge, we introduce a universal data purification method that defends naturally trained classifiers from malicious white-, gray-, and black-box image poisons by applying a universal stochastic preprocessing step $Ψ_{T}(x)$, realized by iterative Langevin sampling of a convergent Energy Based Model (EBM) initialized with an image $x.$ Mid-run dynamics of $Ψ_{T}(x)$ purify poison information with minimal impact on features important to the generalization of a classifier network. We show that EBMs remain universal purifiers, even in the presence of poisoned EBM training data, and achieve SoTA defense on leading triggered and triggerless poisons. This work is a subset of a larger framework introduced in \pgen with a more detailed focus on EBM purification and poison defense.

PureEBM: Universal Poison Purification via Mid-Run Dynamics of Energy-Based Models

TL;DR

Data poisoning threatens model reliability through imperceptible backdoors or triggerless perturbations. PureEBM introduces a universal purification preprocessor that uses mid-run Langevin dynamics of a convergent Energy-Based Model to move poisoned inputs toward the natural-data energy basin via the stochastic transform . The method achieves state-of-the-art defense across multiple attack types (BP, GM, NS, Narcissus) and remains effective even when the EBM is trained on poisoned or POOD data, with minimal impact on natural accuracy. Its preprocessor design, model- and dataset-agnostic applicability, and favorable compute-accuracy trade-offs enable practical deployment across diverse architectures and settings.

Abstract

Data poisoning attacks pose a significant threat to the integrity of machine learning models by leading to misclassification of target distribution data by injecting adversarial examples during training. Existing state-of-the-art (SoTA) defense methods suffer from limitations, such as significantly reduced generalization performance and significant overhead during training, making them impractical or limited for real-world applications. In response to this challenge, we introduce a universal data purification method that defends naturally trained classifiers from malicious white-, gray-, and black-box image poisons by applying a universal stochastic preprocessing step , realized by iterative Langevin sampling of a convergent Energy Based Model (EBM) initialized with an image Mid-run dynamics of purify poison information with minimal impact on features important to the generalization of a classifier network. We show that EBMs remain universal purifiers, even in the presence of poisoned EBM training data, and achieve SoTA defense on leading triggered and triggerless poisons. This work is a subset of a larger framework introduced in \pgen with a more detailed focus on EBM purification and poison defense.
Paper Structure (37 sections, 14 equations, 13 figures, 9 tables, 2 algorithms)

This paper contains 37 sections, 14 equations, 13 figures, 9 tables, 2 algorithms.

Figures (13)

  • Figure 1: Top The full PureEBM pipeline is shown where we apply our method as a preprocessing step with no further downstream changes to the classifier training or inference. Poisoned images are moderately exaggerated to show visually.Bottom Left Energy distributions of clean, poisoned, and purified images. Our method pushes poisoned images via purification into the natural image energy manifold. Bottom Right The removal of poisons and similarity of clean and poisoned images with more MCMC steps. The purified dataset results in SoTA defense and high classifier accuracy.
  • Figure 2: Plot of $\ell_2$ distances between clean images and clean purified (blue), clean images and poisoned purified (green), and poisoned images and poisoned purified images (orange) at points on the MCMC sampling trajectory. Purifying poisoned images for less than 250 steps moves a poisoned image closer to its clean image with a minimum around 150, preserving the natural image while removing the adversarial features.
  • Figure 3: Defense Interpretability: Model using PureEBM focuses on the outline of the horse in the occlusions analysis and to a higher degree on the primary features in the gradient space than even the clean model on clean data.
  • Figure 4: Estimate loss curvature - classifier robustness - with $\log\left(\left|\mathbf{H}\right|\right)$ against both full and poisoned subset of training data. Model trained with PureEBM has the lowest curvature compared to SoTA defense methods.
  • Figure 5: Left: The maximal Lyapunov exponent varies significantly with different values of the noise parameter $\eta_{noise}$. Notably, at $\eta_{noise}= 1$, which is the setting used in our training and defense dynamics, there is a critical transition observed. This transition is from an ordered region, where the maximal exponent is zero, to a chaotic region characterized by a positive maximal exponent. This observation is crucial for understanding the underlying dynamics of our model. Right: The appearance of steady-state samples exhibits marked differences across the spectrum of $\eta_{noise}$ values. For lower values of $\eta_{noise}$, the generated images tend to be oversaturated. Conversely, higher values of $\eta_{noise}$ result in noisy images. However, there exists a narrow window around $\eta_{noise} = 1$ where a balance is achieved between gradient and noise forces, leading to realistic synthesis of images.
  • ...and 8 more figures