Ensembles provably learn equivariance through data augmentation
Oskar Nordenfors, Axel Flinth
TL;DR
The paper addresses whether ensemble learning with data augmentation yields equivariance beyond the neural tangent kernel regime, and shows that this phenomenon holds for general architectures under a simple condition: the architecture space must be invariant to the group action. It proves that gradient flow with full augmentation and SGD with random augmentations produce equivariant ensembles when the architecture space is ρ-invariant and the loss is ρ-invariant, and it provides finite-ensemble sample complexity bounds. The contribution lies in unifying and extending prior NTK-based results to realistic settings, including transformers and stochastic augmentation, and validating the theory with experiments on discrete rotations and continuous SO(3) rotations. This work broadens the theoretical foundation for leveraging symmetries in ensemble methods and informs architectural design and augmentation strategies for improved equivariance in practice.
Abstract
Recently, it was proved that group equivariance emerges in ensembles of neural networks as the result of full augmentation in the limit of infinitely wide neural networks (neural tangent kernel limit). In this paper, we extend this result significantly. We provide a proof that this emergence does not depend on the neural tangent kernel limit at all. We also consider stochastic settings, and furthermore general architectures. For the latter, we provide a simple sufficient condition on the relation between the architecture and the action of the group for our results to hold. We validate our findings through simple numeric experiments.
