Emergent Equivariance in Deep Ensembles

Jan E. Gerken; Pan Kessel

Emergent Equivariance in Deep Ensembles

Jan E. Gerken, Pan Kessel

TL;DR

The paper tackles how to enforce symmetry in neural networks by leveraging deep ensembles and data augmentation. Using neural tangent kernel theory in the infinite-width limit, it proves that with full augmentation the ensemble mean becomes equivariant to symmetries for all inputs and training times, even off-manifold, while individual members need not be. It provides finite-width and finite-ensemble bounds, analyzes continuous versus discrete groups, and validates the theory across Ising models, FashionMNIST, and histological data, showing practical invariance gains and competitive performance relative to manifestly equivariant methods. The findings suggest a simple, scalable way to achieve emergent equivariance without bespoke architectures, with implications for robustness, uncertainty estimation, and symmetry-compliant learning in scientific and visual domains.

Abstract

We show that deep ensembles become equivariant for all inputs and at all training times by simply using data augmentation. Crucially, equivariance holds off-manifold and for any architecture in the infinite width limit. The equivariance is emergent in the sense that predictions of individual ensemble members are not equivariant but their collective prediction is. Neural tangent kernel theory is used to derive this result and we verify our theoretical insights using detailed numerical experiments.

Emergent Equivariance in Deep Ensembles

TL;DR

Abstract

Paper Structure (54 sections, 13 theorems, 105 equations, 16 figures, 1 table)

This paper contains 54 sections, 13 theorems, 105 equations, 16 figures, 1 table.

Introduction
Related Works
Deep Ensembles, Equivariance.
Equivariance without architecture constraints.
Neural Tangent Kernel.
Data augmentation and kernel machines.
Deep Ensembles and Neural Tangent Kernels
Deep Ensemble.
Relation to NTK.
Equivariance and Data Augmentation
Representations of Groups.
Equivariance.
Data Augmentation.
Emergent Equivariance for Large-Width Deep Ensembles
Assumptions.
...and 39 more sections

Key Result

Theorem 5.1

Let $G$ be a group and $\rho_X$ a representation of $G$ acting on the input space $X$ as in (eq:rhoX). Then, the neural tangent kernel $\Theta$, as defined in (eq:ntk_def), as well as the NNGP kernel $\mathcal{K}$, as defined in (eq:nngp_def), of a neural network satisfying the assumptions above tra for all $g \in G$ and $x,x' \in X$, where $\rho_{K}$ is a transformation acting on the spatial dime

Figures (16)

Figure 1: Invariance of predicted energies with respect to lattice rotations by $90^{\circ}$. Solid lines refer to predictions of individual ensemble members and their standard deviation, dashed lines refer to mean predictions of the ensemble. Zoom-ins in the second row show that the invariance of mean predictions converges to NTK invariance for large ensembles and network widths.
Figure 2: Emergent invariance for FashionMNIST Left: Number of out-of-distribution MNIST samples with the same prediction across a symmetry orbit for group orders 4 (green), 8 (blue), and 16 (red) versus training epoch. The models were trained on augmented FashionMNIST. Solid lines show the ensemble prediction. Shaded area is between the 25th and 75th quantile of the predictions of individual members of the ensemble. Right: Out of distribution invariance in the same setup as on the left-hand-side at group order 16. As the number of ensemble members increases, the prediction becomes more invariant, as expected.
Figure 3: Equivariance extends to $SO(2)$ symmetry. Fraction of randomly sampled rotations that leave the prediction invariant is reported. Data augmentation with group order 4 (green), 8 (blue), 16 (red) is used. As expected, the equivariance increases with the group order.
Figure 4: Ensemble invariance on OOD data for ensembles trained on histological data. Number of OOD samples with the same prediction across a symmetry orbit for group orders 4 (blue), 8 (orange), 12 (green) and 16 (red) versus training epoch. Even for ensemble size 5 (left), the ensemble predictions (solid line) are more invariant than the ensemble members (shaded region corresponding to 25th to 75th percentile of ensemble members). The effect is larger for ensemble size 20 (right).
Figure 5: Difference in relative predicted total energy $\mathcal{E}$ between the ensembles and the NTK on the training data, in-distribution test data and out of distribution.
...and 11 more figures

Theorems & Definitions (24)

Remark 3.1
Theorem 5.1: Kernel transformation
proof
Lemma 5.1: Shift of permutation
proof
Theorem 5.2: Emergent Equivariance of Deep Ensembles
proof
Lemma 6.0: Bound for finite ensemble members
Lemma 6.0: Bound for continuous groups
Theorem 2.1: Kernel transformation
...and 14 more

Emergent Equivariance in Deep Ensembles

TL;DR

Abstract

Emergent Equivariance in Deep Ensembles

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (16)

Theorems & Definitions (24)