Table of Contents
Fetching ...

Relaxed Equivariance via Multitask Learning

Ahmed A. Elhag, T. Konstantin Rusch, Francesco Di Giovanni, Michael Bronstein

TL;DR

This work addresses the cost and rigidity of strictly equivariant architectures by introducing REMUL, a multitask training procedure that learns approximate equivariance for unconstrained networks through a tunable equivariance loss. By formulating the training objective as L_total = α L_obj + β L_equi and adaptively adjusting α and β (including GradNorm-based strategies), REMUL balances task performance against symmetry enforcement. Empirically, REMUL achieves competitive results against fully equivariant baselines across N-body dynamics, motion capture, and molecular dynamics (MD17) while delivering substantial speedups in inference (up to 10×) and training (up to 2.5×). The approach provides a practical and flexible means to leverage roto-translational symmetry in unconstrained architectures and offers concrete metrics to quantify learned equivariance, with task-dependent optimal levels of symmetry.

Abstract

Incorporating equivariance as an inductive bias into deep learning architectures to take advantage of the data symmetry has been successful in multiple applications, such as chemistry and dynamical systems. In particular, roto-translations are crucial for effectively modeling geometric graphs and molecules, where understanding the 3D structures enhances generalization. However, strictly equivariant models often pose challenges due to their higher computational complexity. In this paper, we introduce REMUL, a training procedure that learns \emph{approximate} equivariance for unconstrained networks via multitask learning. By formulating equivariance as a tunable objective alongside the primary task loss, REMUL offers a principled way to control the degree of approximate symmetry, relaxing the rigid constraints of traditional equivariant architectures. We show that unconstrained models (which do not build equivariance into the architecture) can learn approximate symmetries by minimizing an additional simple equivariance loss. This enables quantitative control over the trade-off between enforcing equivariance constraints and optimizing for task-specific performance. Our method achieves competitive performance compared to equivariant baselines while being significantly faster (up to 10$\times$ at inference and 2.5$\times$ at training), offering a practical and adaptable approach to leveraging symmetry in unconstrained architectures.

Relaxed Equivariance via Multitask Learning

TL;DR

This work addresses the cost and rigidity of strictly equivariant architectures by introducing REMUL, a multitask training procedure that learns approximate equivariance for unconstrained networks through a tunable equivariance loss. By formulating the training objective as L_total = α L_obj + β L_equi and adaptively adjusting α and β (including GradNorm-based strategies), REMUL balances task performance against symmetry enforcement. Empirically, REMUL achieves competitive results against fully equivariant baselines across N-body dynamics, motion capture, and molecular dynamics (MD17) while delivering substantial speedups in inference (up to 10×) and training (up to 2.5×). The approach provides a practical and flexible means to leverage roto-translational symmetry in unconstrained architectures and offers concrete metrics to quantify learned equivariance, with task-dependent optimal levels of symmetry.

Abstract

Incorporating equivariance as an inductive bias into deep learning architectures to take advantage of the data symmetry has been successful in multiple applications, such as chemistry and dynamical systems. In particular, roto-translations are crucial for effectively modeling geometric graphs and molecules, where understanding the 3D structures enhances generalization. However, strictly equivariant models often pose challenges due to their higher computational complexity. In this paper, we introduce REMUL, a training procedure that learns \emph{approximate} equivariance for unconstrained networks via multitask learning. By formulating equivariance as a tunable objective alongside the primary task loss, REMUL offers a principled way to control the degree of approximate symmetry, relaxing the rigid constraints of traditional equivariant architectures. We show that unconstrained models (which do not build equivariance into the architecture) can learn approximate symmetries by minimizing an additional simple equivariance loss. This enables quantitative control over the trade-off between enforcing equivariance constraints and optimizing for task-specific performance. Our method achieves competitive performance compared to equivariant baselines while being significantly faster (up to 10 at inference and 2.5 at training), offering a practical and adaptable approach to leveraging symmetry in unconstrained architectures.

Paper Structure

This paper contains 35 sections, 1 theorem, 17 equations, 12 figures, 9 tables, 1 algorithm.

Key Result

Proposition 1

Let $f_{\alpha,\beta} \in \arg\min_{f\in\mathcal{H}}\widehat{\mathcal{L}}_{\mathrm{total}}(f; \alpha, \beta)$ be an empirical minimizer of the REMUL objective, and let $f^\star_{\mathrm{obj}} \in \arg\min_{f\in\mathcal{H}} \widehat{\mathcal{L}}_{\mathrm{obj}}(f)$ be an empirical minimizer for the ob

Figures (12)

  • Figure 1: N-body dynamical system. Each row represents a different evaluation scenario. Top: in-distribution performance, Middle: out-of-distribution performance, Bottom: equivariance error. The columns correspond to different architectures/ model conditions. (a) Transformer trained with REMUL (gradual penalty), (b) Transformer trained with a constant penalty, (c) Baselines (equivariant models, standard Transformer, and data augmentation). We conclude that Transformer architecture with high $\beta$ reduces the equivariance error and improves the performance.
  • Figure 2: Motion Capture dataset: Transformer trained with REMUL. We show a trade-off between model performance and equiv. error, where high penalty $\beta$ gives less equiv. error (more equivariant model) but the best performance comes at an intermediate level of equivariance for both tasks.
  • Figure 3: Computational time for GATr and Transformer architectures. GATr has the highest time in all scenarios. Inference times for all versions of the Transformer (standard and trained with equivariance loss and data augmentation) are the same.
  • Figure 4: Loss surface around local minima of trained models on N-body dynamical system.
  • Figure 5: N-body dynamical system. The second equivariance measure $E'$. (a) Transformer trained with REMUL (gradual penalty), (b) Transformer trained with REMUL (constant penalty), and (c) Baselines: Equivariant models, standard Transformer, and data augmentation.
  • ...and 7 more figures

Theorems & Definitions (5)

  • Definition 1
  • Definition 2
  • Proposition 1
  • proof
  • proof