Table of Contents
Fetching ...

How unconstrained machine-learning models learn physical symmetries

Michelangelo Domina, Joseph William Abbott, Paolo Pegolo, Filippo Bigi, Michele Ceriotti

Abstract

The requirement of generating predictions that exactly fulfill the fundamental symmetry of the corresponding physical quantities has profoundly shaped the development of machine-learning models for physical simulations. In many cases, models are built using constrained mathematical forms that ensure that symmetries are enforced exactly. However, unconstrained models that do not obey rotational symmetries are often found to have competitive performance, and to be able to \emph{learn} to a high level of accuracy an approximate equivariant behavior with a simple data augmentation strategy. In this paper, we introduce rigorous metrics to measure the symmetry content of the learned representations in such models, and assess the accuracy by which the outputs fulfill the equivariant condition. We apply these metrics to two unconstrained, transformer-based models operating on decorated point clouds (a graph neural network for atomistic simulations and a PointNet-style architecture for particle physics) to investigate how symmetry information is processed across architectural layers and is learned during training. Based on these insights, we establish a rigorous framework for diagnosing spectral failure modes in ML models. Enabled by this analysis, we demonstrate that one can achieve superior stability and accuracy by strategically injecting the minimum required inductive biases, preserving the high expressivity and scalability of unconstrained architectures while guaranteeing physical fidelity.

How unconstrained machine-learning models learn physical symmetries

Abstract

The requirement of generating predictions that exactly fulfill the fundamental symmetry of the corresponding physical quantities has profoundly shaped the development of machine-learning models for physical simulations. In many cases, models are built using constrained mathematical forms that ensure that symmetries are enforced exactly. However, unconstrained models that do not obey rotational symmetries are often found to have competitive performance, and to be able to \emph{learn} to a high level of accuracy an approximate equivariant behavior with a simple data augmentation strategy. In this paper, we introduce rigorous metrics to measure the symmetry content of the learned representations in such models, and assess the accuracy by which the outputs fulfill the equivariant condition. We apply these metrics to two unconstrained, transformer-based models operating on decorated point clouds (a graph neural network for atomistic simulations and a PointNet-style architecture for particle physics) to investigate how symmetry information is processed across architectural layers and is learned during training. Based on these insights, we establish a rigorous framework for diagnosing spectral failure modes in ML models. Enabled by this analysis, we demonstrate that one can achieve superior stability and accuracy by strategically injecting the minimum required inductive biases, preserving the high expressivity and scalability of unconstrained architectures while guaranteeing physical fidelity.

Paper Structure

This paper contains 10 sections, 7 equations, 9 figures.

Figures (9)

  • Figure 1: Overview of the structure of a symmetry-aware ML model, the conditions of group equivariance, and the symmetry diagnostic metrics introduced in this work. a) the ML model is represented by a generic smooth function, $f$, that predicts the physical properties (tensors of different rank), $\hat{y}$, of an input, e.g. a decorated point cloud, $x$. $f$ can be a symmetry-preserving (i.e. equivariant) or unconstrained model. b) Group equivariance is preserved if and only if the model predictions transform like the inputs under the action of the appropriate group symmetry operations. c) The metrics $A_{\alpha}$ and $B_{\alpha}$ introduced in this work quantify the equivariance error of model predictions and the group symmetry content of internal features. For a set of inputs given by Haar integration of $x$ over the group, the equivariance error, $A_{\alpha}$, is given by the variance of back-transformed model predictions, while the character projections, $B_{\alpha}$, gives the group symmetry decomposition of model features from arbitrary layers.
  • Figure 2: Equivariance diagnostics for a PET MLIP. Top: distributions of the absolute error (AE) and equivariance error, $A_\alpha$, for energy $E$, non-conservative forces $\mathbf{f}_{\text{NC}}$, and non-conservative stress $\mathbf{S}_{\text{NC}}$. The arrows on the x-axis indicate the distribution medians. Bottom: normalized character projections, $B_\alpha$, for the corresponding quantities as a function of the probed angular momentum channel $\lambda$. Solid lines and markers are averaged over 150 randomly sampled test structures, while faint lines show the individual structure projections.
  • Figure 3: (a) Character projection heatmaps report on the magnitude of the character projection $B_\alpha$ for a quantity as a function of the character $\alpha$ of the relevant group and along successive epochs of a training run. In this case the characters are the $(\lambda, \sigma)$ irreps of the O(3) group. (b) Training curves and character heatmaps for energy, $E$, non-conservative forces, $\textbf{f}_{\text{NC}}$, and the two irreducible spherical components of the non-conservative stress, $\textbf{S}_{\text{NC}}$.
  • Figure 4: Overview of a PET MLIP architecture and dynamical evolution of the internal features. (a) Local atomic environments of atoms in a molecule/material are represented as decorated point clouds. The atomic species of the central atom ($a_i$) and its neighbors ($a_{ij}$), together with the edge vectors ($\mathbf{r}_{ij})$ and their magnitudes ($r_{ij}$), form the inputs to PET. (b) Backbone modules: embedding modules map the inputs to the latent space via Center, Neighbor, and Geometry Embedders. These feed into one Edge Embedder per GNN layer, which then enters the Transformer Layers. (c) Full architecture: the complete model pipeline leading to predictions for energy $E$, non-conservative forces $\mathbf{f}_\text{NC}$, and non-conservative stress $\mathbf{S}_\text{NC}$. The following architecture hyperparameters were chosen: number of GNN layers $n_{\text{GNN}} = 2$, number of transformer layers $n_{\text{TL}} = 2$, cutoff $r_{\text{cut}} = 4.5$ Å, edge feature size $d_{\text{PET}} = 128$, node feature size $d_{\text{node}} = 256$, LLF size $d_{\text{head}} = 128$. The surrounding heatmaps, labeled with red letters (A--L), correspond to specific points in the architecture. They display the intensity of the normalized character projections, $B_\alpha/\langle\|t\|_2^2\rangle_G$, as a function of the training epoch ($x$-axis, log-scale) and the probed $\lambda$ channel ($y$-axis). Within each heatmap group, the top and bottom panels represent the $\sigma$ channels, respectively (as detailed in heatmap A). The thin, isolated column on the far left of each heatmap represents the untrained model, denoted by "U" on the $x$-axis.
  • Figure 5: Model error and equivariance error for the energy $E$, non-conservative forces $\textbf{f}_\text{NC}$ and $\lambda=0,2$ components of the non-conservative stress $\textbf{S}_\text{NC}$ for a universal PET model (trained and tested on MAD-1.5) whose energy and non-conservative force readout layers have been retrained with a loss $L=L_\mu + \gamma L_\sigma$ combining model error and an equivariance error penalty. The marker colors correspond to the weighting of the equivariance penalty, $\gamma$.
  • ...and 4 more figures