Table of Contents
Fetching ...

Purify Unlearnable Examples via Rate-Constrained Variational Autoencoders

Yi Yu, Yufei Wang, Song Xia, Wenhan Yang, Shijian Lu, Yap-Peng Tan, Alex C. Kot

TL;DR

This work tackles the vulnerability of machine learning models to unlearnable examples (UEs) by proposing pre-training purification using rate-constrained variational autoencoders (VAEs). It introduces D-VAE, a VAE variant with learnable class-wise embeddings and an auxiliary decoder to disentangle and recover perturbations, followed by a two-stage purification framework that first removes most perturbations and then refines the results. The approach achieves strong empirical performance across CIFAR-10, CIFAR-100, and a 100-class ImageNet subset, outperforming training-time defenses and other pre-training methods while remaining computationally efficient. The results suggest a practical pathway for data-centric defense against data poisoning, with code availability enabling replication and benchmarking in real-world pipelines.

Abstract

Unlearnable examples (UEs) seek to maximize testing error by making subtle modifications to training examples that are correctly labeled. Defenses against these poisoning attacks can be categorized based on whether specific interventions are adopted during training. The first approach is training-time defense, such as adversarial training, which can mitigate poisoning effects but is computationally intensive. The other approach is pre-training purification, e.g., image short squeezing, which consists of several simple compressions but often encounters challenges in dealing with various UEs. Our work provides a novel disentanglement mechanism to build an efficient pre-training purification method. Firstly, we uncover rate-constrained variational autoencoders (VAEs), demonstrating a clear tendency to suppress the perturbations in UEs. We subsequently conduct a theoretical analysis for this phenomenon. Building upon these insights, we introduce a disentangle variational autoencoder (D-VAE), capable of disentangling the perturbations with learnable class-wise embeddings. Based on this network, a two-stage purification approach is naturally developed. The first stage focuses on roughly eliminating perturbations, while the second stage produces refined, poison-free results, ensuring effectiveness and robustness across various scenarios. Extensive experiments demonstrate the remarkable performance of our method across CIFAR-10, CIFAR-100, and a 100-class ImageNet-subset. Code is available at https://github.com/yuyi-sd/D-VAE.

Purify Unlearnable Examples via Rate-Constrained Variational Autoencoders

TL;DR

This work tackles the vulnerability of machine learning models to unlearnable examples (UEs) by proposing pre-training purification using rate-constrained variational autoencoders (VAEs). It introduces D-VAE, a VAE variant with learnable class-wise embeddings and an auxiliary decoder to disentangle and recover perturbations, followed by a two-stage purification framework that first removes most perturbations and then refines the results. The approach achieves strong empirical performance across CIFAR-10, CIFAR-100, and a 100-class ImageNet subset, outperforming training-time defenses and other pre-training methods while remaining computationally efficient. The results suggest a practical pathway for data-centric defense against data poisoning, with code availability enabling replication and benchmarking in real-world pipelines.

Abstract

Unlearnable examples (UEs) seek to maximize testing error by making subtle modifications to training examples that are correctly labeled. Defenses against these poisoning attacks can be categorized based on whether specific interventions are adopted during training. The first approach is training-time defense, such as adversarial training, which can mitigate poisoning effects but is computationally intensive. The other approach is pre-training purification, e.g., image short squeezing, which consists of several simple compressions but often encounters challenges in dealing with various UEs. Our work provides a novel disentanglement mechanism to build an efficient pre-training purification method. Firstly, we uncover rate-constrained variational autoencoders (VAEs), demonstrating a clear tendency to suppress the perturbations in UEs. We subsequently conduct a theoretical analysis for this phenomenon. Building upon these insights, we introduce a disentangle variational autoencoder (D-VAE), capable of disentangling the perturbations with learnable class-wise embeddings. Based on this network, a two-stage purification approach is naturally developed. The first stage focuses on roughly eliminating perturbations, while the second stage produces refined, poison-free results, ensuring effectiveness and robustness across various scenarios. Extensive experiments demonstrate the remarkable performance of our method across CIFAR-10, CIFAR-100, and a 100-class ImageNet-subset. Code is available at https://github.com/yuyi-sd/D-VAE.
Paper Structure (37 sections, 4 theorems, 41 equations, 6 figures, 13 tables)

This paper contains 37 sections, 4 theorems, 41 equations, 6 figures, 13 tables.

Key Result

Proposition 3.1

For the features ${\boldsymbol{v}} = ({\boldsymbol{v}}_c, {\boldsymbol{v}}_s^t)$ following the distribution (distribution), the optimal separating hyperplane using a Bayes classifier is formulated by:

Figures (6)

  • Figure 1: (a) Visual depiction of D-VAE containing two components. One component generates reconstructed images $\boldsymbol{\hat{x}}$, preserving the primary content of unlearnable inputs $\boldsymbol{x}$. The auxiliary decoder maps a trainable class-wise embedding $\boldsymbol{u_y}$ and latents $\boldsymbol{z}$ to disentangled perturbations $\hat{\boldsymbol{p}}$. Here, $\boldsymbol{x_c}$ is clean data, and $\boldsymbol{p}$ is added perturbations. Perturbations are normalized for better views. (b) The purification framework consisting of two stages. The overall purification can be formulated as $\boldsymbol{x}^3 = \boldsymbol{g}(\boldsymbol{x}^0)$, where $\boldsymbol{x}^0$ is the original unlearnable data.
  • Figure 2: (a): Results of VAEs: PSNR/Test Accuracy Vs. KLD Loss are assessed on the unlearnable CIFAR-10. (b): Comparison between VAEs and JPEG compression: PSNR Vs. Test Accuracy. Note that we adopt JPEG with quality {2,5,10,30,50,70,90} to control the corruption levels. We include EM, REM, and LSP as UEs methods.
  • Figure 3: Test accuracy (%) for each training epoch when using adversarial augmentation qin2023learning
  • Figure 4: Visual results of images before/after purification. Results of stage 2 denote the final purified results. The image is from ImageNet-subset, and the residuals to the clean images are normalized by two ways.
  • Figure 5: Comparison between VAEs and AEs: PSNR Vs. Test Acc. Specifically, we include EM, REM, and LSP as attack methods here.
  • ...and 1 more figures

Theorems & Definitions (5)

  • Proposition 3.1
  • Theorem 3.2
  • Theorem 3.3
  • Remark 3.4
  • Proposition 3.5