Table of Contents
Fetching ...

PUREVQ-GAN: Defending Data Poisoning Attacks through Vector-Quantized Bottlenecks

Alexander Branch, Omead Pooladzandi, Radin Khosraviani, Sunay Gajanan Bhat, Jeffrey Jiang, Gregory Pottie

TL;DR

PureVQ-GAN defends against data poisoning by routing inputs through a discrete bottleneck learned by a VQ-GAN, with a GAN discriminator enforcing a natural-image distribution. The method destroys fine-grained poison signals while preserving semantics, enabling a single-pass purification that is over 50× faster than diffusion-based purifiers. On CIFAR-10, it achieves 0% PSR against Gradient Matching and Bullseye Polytope and 1.64% against Narcissus, while maintaining about 91-95% clean accuracy and PSNR > 40 dB for reconstructions. The approach is scalable to large models and offers practical deployment in real training pipelines, with ablations showing even small codebooks can eliminate poisons. This work introduces a fast, high-fidelity defense that leverages discrete bottlenecks to disrupt adversarial perturbations and uses a GAN prior to maintain output realism.

Abstract

We introduce PureVQ-GAN, a defense against data poisoning that forces backdoor triggers through a discrete bottleneck using Vector-Quantized VAE with GAN discriminator. By quantizing poisoned images through a learned codebook, PureVQ-GAN destroys fine-grained trigger patterns while preserving semantic content. A GAN discriminator ensures outputs match the natural image distribution, preventing reconstruction of out-of-distribution perturbations. On CIFAR-10, PureVQ-GAN achieves 0% poison success rate (PSR) against Gradient Matching and Bullseye Polytope attacks, and 1.64% against Narcissus while maintaining 91-95% clean accuracy. Unlike diffusion-based defenses requiring hundreds of iterative refinement steps, PureVQ-GAN is over 50x faster, making it practical for real training pipelines.

PUREVQ-GAN: Defending Data Poisoning Attacks through Vector-Quantized Bottlenecks

TL;DR

PureVQ-GAN defends against data poisoning by routing inputs through a discrete bottleneck learned by a VQ-GAN, with a GAN discriminator enforcing a natural-image distribution. The method destroys fine-grained poison signals while preserving semantics, enabling a single-pass purification that is over 50× faster than diffusion-based purifiers. On CIFAR-10, it achieves 0% PSR against Gradient Matching and Bullseye Polytope and 1.64% against Narcissus, while maintaining about 91-95% clean accuracy and PSNR > 40 dB for reconstructions. The approach is scalable to large models and offers practical deployment in real training pipelines, with ablations showing even small codebooks can eliminate poisons. This work introduces a fast, high-fidelity defense that leverages discrete bottlenecks to disrupt adversarial perturbations and uses a GAN prior to maintain output realism.

Abstract

We introduce PureVQ-GAN, a defense against data poisoning that forces backdoor triggers through a discrete bottleneck using Vector-Quantized VAE with GAN discriminator. By quantizing poisoned images through a learned codebook, PureVQ-GAN destroys fine-grained trigger patterns while preserving semantic content. A GAN discriminator ensures outputs match the natural image distribution, preventing reconstruction of out-of-distribution perturbations. On CIFAR-10, PureVQ-GAN achieves 0% poison success rate (PSR) against Gradient Matching and Bullseye Polytope attacks, and 1.64% against Narcissus while maintaining 91-95% clean accuracy. Unlike diffusion-based defenses requiring hundreds of iterative refinement steps, PureVQ-GAN is over 50x faster, making it practical for real training pipelines.

Paper Structure

This paper contains 16 sections, 4 equations, 3 figures, 2 tables.

Figures (3)

  • Figure 1: PureVQ-GAN architecture.
  • Figure 2: Ablations. (a) Larger models improve clean accuracy with diminishing returns >50M params (trend verified up to 300M). (b) Even K=64 achieves 0% PSR; larger K improves reconstruction.
  • Figure 3: Visual comparison. PureVQ-GAN removes triggers while maintaining image quality.