Table of Contents
Fetching ...

Slowing Learning by Erasing Simple Features

Lucia Quirke, Nora Belrose

TL;DR

The paper investigates the learning biases of neural networks by erasing simple statistical cues from data. It introduces LEACE for first-order information and two quadratic erasure methods, QLEACE and ALF-QLEACE, grounded in optimal transport theory, to remove informative moments from representations. Across CIFAR-10, CIFARNet, SVHN, and modern architectures, LEACE consistently slows learning, while quadratic erasure yields mixed, architecture-dependent outcomes including backfiring in some cases; gradient-based and ALF variants show augmentation-like effects but can be unreliable. The work highlights both the potential and pitfalls of concept erasure for interpretability and robustness, emphasizing caution when applying quadratic erasure in practice.

Abstract

Prior work suggests that neural networks tend to learn low-order moments of the data distribution first, before moving on to higher-order correlations. In this work, we derive a novel closed-form concept erasure method, QLEACE, which surgically removes all quadratically available information about a concept from a representation. Through comparisons with linear erasure (LEACE) and two approximate forms of quadratic erasure, we explore whether networks can still learn when low-order statistics are removed from image classification datasets. We find that while LEACE consistently slows learning, quadratic erasure can exhibit both positive and negative effects on learning speed depending on the choice of dataset, model architecture, and erasure method. Use of QLEACE consistently slows learning in feedforward architectures, but more sophisticated architectures learn to use injected higher order Shannon information about class labels. Its approximate variants avoid injecting information, but surprisingly act as data augmentation techniques on some datasets, enhancing learning speed compared to LEACE.

Slowing Learning by Erasing Simple Features

TL;DR

The paper investigates the learning biases of neural networks by erasing simple statistical cues from data. It introduces LEACE for first-order information and two quadratic erasure methods, QLEACE and ALF-QLEACE, grounded in optimal transport theory, to remove informative moments from representations. Across CIFAR-10, CIFARNet, SVHN, and modern architectures, LEACE consistently slows learning, while quadratic erasure yields mixed, architecture-dependent outcomes including backfiring in some cases; gradient-based and ALF variants show augmentation-like effects but can be unreliable. The work highlights both the potential and pitfalls of concept erasure for interpretability and robustness, emphasizing caution when applying quadratic erasure in practice.

Abstract

Prior work suggests that neural networks tend to learn low-order moments of the data distribution first, before moving on to higher-order correlations. In this work, we derive a novel closed-form concept erasure method, QLEACE, which surgically removes all quadratically available information about a concept from a representation. Through comparisons with linear erasure (LEACE) and two approximate forms of quadratic erasure, we explore whether networks can still learn when low-order statistics are removed from image classification datasets. We find that while LEACE consistently slows learning, quadratic erasure can exhibit both positive and negative effects on learning speed depending on the choice of dataset, model architecture, and erasure method. Use of QLEACE consistently slows learning in feedforward architectures, but more sophisticated architectures learn to use injected higher order Shannon information about class labels. Its approximate variants avoid injecting information, but surprisingly act as data augmentation techniques on some datasets, enhancing learning speed compared to LEACE.

Paper Structure

This paper contains 18 sections, 7 theorems, 18 equations, 16 figures.

Key Result

Theorem 2.2

Suppose $\color{black} \mathcal{L}$ is convex in $\eta(\boldsymbol{x})$. Then if for each class $\boldsymbol{z} \in \mathcal{Z}$ and each order $n \in 1\ldots N$, the tensor of class-conditional moments $\mathbb{E}[\mathrm X_{i_1}\ldots \mathrm X_{i_n} | \mathrm Z = \boldsymbol{z}]$ is equal to the

Figures (16)

  • Figure 1: Ship from the CIFARNet training set edited with each eraser. LEACE and gradient-based quadratic erasure minimally affect intelligibility of the image, while QLEACE and ALF-QLEACE reduce intelligibility. The ALF-QLEACE intervention is rank $d - 15$. A random projection of equal rank is included for comparison.
  • Figure 2: Images can be easily identified after gradient-based erasure. Non-cherrypicked examples from CIFAR-10, CIFARNet, and SVHN.
  • Figure 3: Increase in MDL from the erasure of the CIFAR-10 dataset over 5 random seeds for ReLU MLPs and ConvNeXt V2s of various widths. While LEACE slows learning similarly in both architectures, ConvNeXts are less affected by quadratic erasure and exhibit a backfiring effect on data modified with QLEACE, resulting in improved performance relative to unerased data.
  • Figure 4: MDL over 5 random seeds for feedforward networks of various lengths and widths on linearly erased CIFAR-10. All models have a constant depth of 2 when width is varying and a constant width of 128 when depth is varying.
  • Figure 5: MDL over 5 random seeds for feedforward networks of various lengths and widths on the quadratically erased CIFAR-10 dataset. All models have a constant depth of 2 when width is varying and a constant width of 128 when depth is varying.
  • ...and 11 more figures

Theorems & Definitions (15)

  • Definition 2.1: Polynomial Predictor
  • Theorem 2.2
  • proof
  • Theorem 2.3
  • proof
  • Lemma 2.4: Gaussian Wasserstein Barycenter
  • proof
  • Lemma 2.5: 2-Wasserstein Lower Bound
  • proof
  • Theorem 2.6: Quadratic LEACE
  • ...and 5 more