Symmetry Induces Structure and Constraint of Learning
Liu Ziyin
TL;DR
This work addresses how loss-function symmetries shape learning in deep networks by introducing a unified mirror-reflection framework. It proves a central result that any $O$-mirror symmetric loss imposes a constraint $O^T\theta=0$, and shows that SGD with weight decay or gradient noise drives training toward these symmetry-constrained solutions, yielding structured phenomena such as sparsity, low-rankness, and homogeneous ensembling. The authors extend the theory with an L1-equivalence view and a differentiable constraint algorithm (DCS) to enforce symmetry-induced constraints in practice, and validate the framework across rescaling, rotation, and permutation symmetries including experiments on linear regression, matrix factorization, CIFAR-10 with ResNet18, and transformers. The findings offer a principled explanation for loss of plasticity and neural collapses and provide practical design guidance for enforcing or removing symmetries to tailor model capacity and representation structure.
Abstract
Due to common architecture designs, symmetries exist extensively in contemporary neural networks. In this work, we unveil the importance of the loss function symmetries in affecting, if not deciding, the learning behavior of machine learning models. We prove that every mirror-reflection symmetry, with reflection surface $O$, in the loss function leads to the emergence of a constraint on the model parameters $θ$: $O^Tθ=0$. This constrained solution becomes satisfied when either the weight decay or gradient noise is large. Common instances of mirror symmetries in deep learning include rescaling, rotation, and permutation symmetry. As direct corollaries, we show that rescaling symmetry leads to sparsity, rotation symmetry leads to low rankness, and permutation symmetry leads to homogeneous ensembling. Then, we show that the theoretical framework can explain intriguing phenomena, such as the loss of plasticity and various collapse phenomena in neural networks, and suggest how symmetries can be used to design an elegant algorithm to enforce hard constraints in a differentiable way.
