Ensembles provably learn equivariance through data augmentation

Oskar Nordenfors; Axel Flinth

Ensembles provably learn equivariance through data augmentation

Oskar Nordenfors, Axel Flinth

TL;DR

The paper addresses whether ensemble learning with data augmentation yields equivariance beyond the neural tangent kernel regime, and shows that this phenomenon holds for general architectures under a simple condition: the architecture space must be invariant to the group action. It proves that gradient flow with full augmentation and SGD with random augmentations produce equivariant ensembles when the architecture space is ρ-invariant and the loss is ρ-invariant, and it provides finite-ensemble sample complexity bounds. The contribution lies in unifying and extending prior NTK-based results to realistic settings, including transformers and stochastic augmentation, and validating the theory with experiments on discrete rotations and continuous SO(3) rotations. This work broadens the theoretical foundation for leveraging symmetries in ensemble methods and informs architectural design and augmentation strategies for improved equivariance in practice.

Abstract

Recently, it was proved that group equivariance emerges in ensembles of neural networks as the result of full augmentation in the limit of infinitely wide neural networks (neural tangent kernel limit). In this paper, we extend this result significantly. We provide a proof that this emergence does not depend on the neural tangent kernel limit at all. We also consider stochastic settings, and furthermore general architectures. For the latter, we provide a simple sufficient condition on the relation between the architecture and the action of the group for our results to hold. We validate our findings through simple numeric experiments.

Ensembles provably learn equivariance through data augmentation

TL;DR

Abstract

Paper Structure (40 sections, 19 theorems, 83 equations, 12 figures, 5 tables)

This paper contains 40 sections, 19 theorems, 83 equations, 12 figures, 5 tables.

Introduction
Literature review
Contribution
More involved architectures
Random augmentations
Finite ensembles
Limitations
Paper outline
Preliminaries
Training algorithms
Ensembles
The architecture space formalism
The invariance criterion
Equivariance of augmented neural network training
Gradient flow
...and 25 more sections

Key Result

Lemma 2.1

Let $\Phi_A$ be a neural network, then, under Assumption (1) that all non-linearities are equivariant, we have that, for every $g\in G$ and $x\in X$.

Figures (12)

Figure 1: A graphical illustration of our results. The symmetry group here is $C_2$, acting on the parameter space through reflection in the $x$-axis, so that the $x$-axis correspond to equivariant models. Snapshots of the parameters of ensemble members as they are trained on symmetric data are shown. At all times, most individual ensemble members do not lie on the line of symmetric models. However, their distribution always is symmetric about the $x$-axis -- and therefore, their mean always corresponds to an equivariant model.
Figure 2: Symmetric (left) and asymmetric (right) filter. Indices in the support are grey.
Figure 3: The architecture used in our neural networks. The convolutions have filters with support as in Figure \ref{['fig:supports']} (left or right). LN stands for LayerNorm and FC stands for Fully-Connected.
Figure 4: Metrics after the 10th epoch for different ensemble sizes for the $C_4$ experiment (top) and $C_{16}$ experiment (bottom). Each datapoint is a mean of 30 bootstrapped examples -- the errorbars denotes one standard deviation of the bootstrap. The $x$-scale in the top plots are logarithmic, both scales are logarithmic in the bottom plots. Best viewed in color.
Figure 5: Metrics after the 10th epoch for different ensemble sizes for the ModelNet experiment. Each datapoint is a mean of 30 bootstrapped examples -- the errorbars denotes one standard deviation of the bootstrap. The $x$-scale in the top plots are logarithmic, both scales are logarithmic in the bottom plots.
...and 7 more figures

Theorems & Definitions (56)

Example 2.1: Trivial representation
Example 2.2: Discrete rotations of images
Definition 1: Representation on $\mathop{\mathrm{Hom}}\nolimits$
Example 2.3
Remark 1
Definition 2: Training algorithm
Definition 3: Iterative training algorithm
Remark 2
Definition 4: Equivariant training algorithm
Lemma 2.1: Equivariance to joint transformation
...and 46 more

Ensembles provably learn equivariance through data augmentation

TL;DR

Abstract

Ensembles provably learn equivariance through data augmentation

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (12)

Theorems & Definitions (56)