Relaxed Equivariance via Multitask Learning
Ahmed A. Elhag, T. Konstantin Rusch, Francesco Di Giovanni, Michael Bronstein
TL;DR
This work addresses the cost and rigidity of strictly equivariant architectures by introducing REMUL, a multitask training procedure that learns approximate equivariance for unconstrained networks through a tunable equivariance loss. By formulating the training objective as L_total = α L_obj + β L_equi and adaptively adjusting α and β (including GradNorm-based strategies), REMUL balances task performance against symmetry enforcement. Empirically, REMUL achieves competitive results against fully equivariant baselines across N-body dynamics, motion capture, and molecular dynamics (MD17) while delivering substantial speedups in inference (up to 10×) and training (up to 2.5×). The approach provides a practical and flexible means to leverage roto-translational symmetry in unconstrained architectures and offers concrete metrics to quantify learned equivariance, with task-dependent optimal levels of symmetry.
Abstract
Incorporating equivariance as an inductive bias into deep learning architectures to take advantage of the data symmetry has been successful in multiple applications, such as chemistry and dynamical systems. In particular, roto-translations are crucial for effectively modeling geometric graphs and molecules, where understanding the 3D structures enhances generalization. However, strictly equivariant models often pose challenges due to their higher computational complexity. In this paper, we introduce REMUL, a training procedure that learns \emph{approximate} equivariance for unconstrained networks via multitask learning. By formulating equivariance as a tunable objective alongside the primary task loss, REMUL offers a principled way to control the degree of approximate symmetry, relaxing the rigid constraints of traditional equivariant architectures. We show that unconstrained models (which do not build equivariance into the architecture) can learn approximate symmetries by minimizing an additional simple equivariance loss. This enables quantitative control over the trade-off between enforcing equivariance constraints and optimizing for task-specific performance. Our method achieves competitive performance compared to equivariant baselines while being significantly faster (up to 10$\times$ at inference and 2.5$\times$ at training), offering a practical and adaptable approach to leveraging symmetry in unconstrained architectures.
