Table of Contents
Fetching ...

Data Augmentation and Regularization for Learning Group Equivariance

Oskar Nordenfors, Axel Flinth

TL;DR

This work investigates learning group equivariance by combining data augmentation with a regularization penalty that suppresses non-equivalent components of neural network parameters. Building on a prior framework, it shows that augmenting training data with symmetry transformations and adding a term $\frac{\gamma}{2}\|\Pi_{\mathcal{E}^{\perp}}A\|^2$ makes the equivariant subspace $\mathcal{E}$ an attractor of the training dynamics for large $\gamma$. The key contributions include a formal analysis of augmented versus equivariant dynamics and a concrete, small-scale SGD experiment confirming attractor behavior, suggesting practical routes to achieve equivariance without hard architectural constraints. The findings have potential practical impact for leveraging known symmetries in diverse architectures by combining augmentation with regularization to obtain provably equivariant behavior in training.

Abstract

In many machine learning tasks, known symmetries can be used as an inductive bias to improve model performance. In this paper, we consider learning group equivariance through training with data augmentation. We summarize results from a previous paper of our own, and extend the results to show that equivariance of the trained model can be achieved through training on augmented data in tandem with regularization.

Data Augmentation and Regularization for Learning Group Equivariance

TL;DR

This work investigates learning group equivariance by combining data augmentation with a regularization penalty that suppresses non-equivalent components of neural network parameters. Building on a prior framework, it shows that augmenting training data with symmetry transformations and adding a term makes the equivariant subspace an attractor of the training dynamics for large . The key contributions include a formal analysis of augmented versus equivariant dynamics and a concrete, small-scale SGD experiment confirming attractor behavior, suggesting practical routes to achieve equivariance without hard architectural constraints. The findings have potential practical impact for leveraging known symmetries in diverse architectures by combining augmentation with regularization to obtain provably equivariant behavior in training.

Abstract

In many machine learning tasks, known symmetries can be used as an inductive bias to improve model performance. In this paper, we consider learning group equivariance through training with data augmentation. We summarize results from a previous paper of our own, and extend the results to show that equivariance of the trained model can be achieved through training on augmented data in tandem with regularization.

Paper Structure

This paper contains 11 sections, 3 theorems, 16 equations, 2 figures.

Key Result

Theorem 1

1. $\mathcal{E}$ is an invariant set of the augmented dynamics. 2. The set of points $S^{\mathrm{eq}}$ and $S^{\mathrm{aug}}$in $\mathcal{E}$ that are stationary for the equivariant and augmented dynamics, respectively, agree.

Figures (2)

  • Figure 1: Regardless of how far from $\mathcal{E}$ we start training we can select the regularization parameter $\gamma$ large enough that the dynamics $\Dot{A}=-\nabla S_{\gamma}(A)$ converge to $\mathcal{E}$ exponentially fast. If a too low value of $\gamma$ is chosen, the dynamics might not converge to $\mathcal{E}$ at all.
  • Figure 2: Projection errors for the two non-equivariant models for different values of $\gamma$. Notice the logarithmic $y$-scale. Opaque lines are medians, and transparent lines are individual runs. Best viewed in color.

Theorems & Definitions (8)

  • Definition 1
  • Definition 2
  • Theorem 1
  • Theorem 2
  • Theorem 3: Equivariant subspace is attractor of regularized augmented gradient flow
  • proof
  • Remark 1
  • Remark 2