Table of Contents
Fetching ...

Towards Provably Unlearnable Examples via Bayes Error Optimisation

Ruihan Zhang, Jun Sun, Ee-Peng Lim, Peixin Zhang

TL;DR

The paper tackles the problem of protecting data from being effectively learned in the presence of training data mixtures by introducing a Bayes-error based framework. It formalizes unlearnability via the Bayes error $\beta_D$ and provides a differentiable estimator from samples using local posterior averaging, enabling optimization under $L^p$ perturbation via projected gradient ascent. The method provably increases the Bayes error and remains effective when unlearnable examples are mixed with clean data, with extensive experiments on CIFAR-10/100 and Tiny ImageNet showing substantial test-accuracy drops and transferability across architectures. It also demonstrates resilience against adversarial training, suggesting practical utility for data owners to protect personal data at the source while outlining avenues for future work on defenses against generative-model threats.

Abstract

The recent success of machine learning models, especially large-scale classifiers and language models, relies heavily on training with massive data. These data are often collected from online sources. This raises serious concerns about the protection of user data, as individuals may not have given consent for their data to be used in training. To address this concern, recent studies introduce the concept of unlearnable examples, i.e., data instances that appear natural but are intentionally altered to prevent models from effectively learning from them. While existing methods demonstrate empirical effectiveness, they typically rely on heuristic trials and lack formal guarantees. Besides, when unlearnable examples are mixed with clean data, as is often the case in practice, their unlearnability disappears. In this work, we propose a novel approach to constructing unlearnable examples by systematically maximising the Bayes error, a measurement of irreducible classification error. We develop an optimisation-based approach and provide an efficient solution using projected gradient ascent. Our method provably increases the Bayes error and remains effective when the unlearning examples are mixed with clean samples. Experimental results across multiple datasets and model architectures are consistent with our theoretical analysis and show that our approach can restrict data learnability, effectively in practice.

Towards Provably Unlearnable Examples via Bayes Error Optimisation

TL;DR

The paper tackles the problem of protecting data from being effectively learned in the presence of training data mixtures by introducing a Bayes-error based framework. It formalizes unlearnability via the Bayes error and provides a differentiable estimator from samples using local posterior averaging, enabling optimization under perturbation via projected gradient ascent. The method provably increases the Bayes error and remains effective when unlearnable examples are mixed with clean data, with extensive experiments on CIFAR-10/100 and Tiny ImageNet showing substantial test-accuracy drops and transferability across architectures. It also demonstrates resilience against adversarial training, suggesting practical utility for data owners to protect personal data at the source while outlining avenues for future work on defenses against generative-model threats.

Abstract

The recent success of machine learning models, especially large-scale classifiers and language models, relies heavily on training with massive data. These data are often collected from online sources. This raises serious concerns about the protection of user data, as individuals may not have given consent for their data to be used in training. To address this concern, recent studies introduce the concept of unlearnable examples, i.e., data instances that appear natural but are intentionally altered to prevent models from effectively learning from them. While existing methods demonstrate empirical effectiveness, they typically rely on heuristic trials and lack formal guarantees. Besides, when unlearnable examples are mixed with clean data, as is often the case in practice, their unlearnability disappears. In this work, we propose a novel approach to constructing unlearnable examples by systematically maximising the Bayes error, a measurement of irreducible classification error. We develop an optimisation-based approach and provide an efficient solution using projected gradient ascent. Our method provably increases the Bayes error and remains effective when the unlearning examples are mixed with clean samples. Experimental results across multiple datasets and model architectures are consistent with our theoretical analysis and show that our approach can restrict data learnability, effectively in practice.

Paper Structure

This paper contains 21 sections, 5 theorems, 20 equations, 6 figures, 1 table, 2 algorithms.

Key Result

Lemma 3.1

Let $s$ be a symmetric, non-negative similarity function satisfying the following conditions, The posterior estimate from alg:posterior converges to the true posterior, i.e., where $C_1$ and $C_2$ are constants.

Figures (6)

  • Figure 1: A joint distribution of two truncated normal distributions. (Left) The density function of this distribution. The grey-shaded area represents the Bayes error. (Right) A sample is drawn from this distribution. Note that the samples are randomly placed in the vertical direction.
  • Figure 2: Perturbation of two-dimensional points.
  • Figure 3: Examples before and after perturbation
  • Figure 4: Test accuracy on each setting. "No add-on" means only training on part of the original training set.
  • Figure 5: Test accuracy for complex datasets
  • ...and 1 more figures

Theorems & Definitions (9)

  • Example 1
  • Lemma 3.1
  • Theorem 3.2
  • proof
  • Example 2
  • Lemma 3.3
  • Theorem 3.4
  • proof
  • Corollary 3.5