Table of Contents
Fetching ...

Scaling Laws and Symmetry, Evidence from Neural Force Fields

Khang Ngo, Siamak Ravanbakhsh

TL;DR

The paper addresses how symmetry inductive biases affect scaling in neural interatomic potentials (NNIPs) by performing a large-scale, geometry-aware study of compute, data, and parameter scaling across unconstrained and equivariant architectures. It introduces two scaling laws: a compute-frontier L(C) with $L(C)=L_\infty + F_c C^{-{\gamma_c}}$ and a sum-power-law L(N,D)=L_\infty + A N^{-{\alpha}} + B D^{-{\beta}}$, with architecture-dependent exponents $\gamma_c$, $\alpha$, and $\beta$. The findings show that higher-order equivariant representations yield larger exponents, meaning the performance gap widens with scale, and that compute-optimal training favors increasing $N$ and $D$ in tandem, with $a,b \approx 0.5$ in the allocation. Additionally, enforcing symmetry via a loss term has limited impact on the compute-optimal frontier compared to actual equivariant architecture design, and the optimal depth grows with the degree of symmetry before saturating. These results have practical implications for scaling NNIPs and motivate developing scalable, high-order equivariant models for atomistic simulations.

Abstract

We present an empirical study in the geometric task of learning interatomic potentials, which shows equivariance matters even more at larger scales; we show a clear power-law scaling behaviour with respect to data, parameters and compute with ``architecture-dependent exponents''. In particular, we observe that equivariant architectures, which leverage task symmetry, scale better than non-equivariant models. Moreover, among equivariant architectures, higher-order representations translate to better scaling exponents. Our analysis also suggests that for compute-optimal training, the data and model sizes should scale in tandem regardless of the architecture. At a high level, these results suggest that, contrary to common belief, we should not leave it to the model to discover fundamental inductive biases such as symmetry, especially as we scale, because they change the inherent difficulty of the task and its scaling laws.

Scaling Laws and Symmetry, Evidence from Neural Force Fields

TL;DR

The paper addresses how symmetry inductive biases affect scaling in neural interatomic potentials (NNIPs) by performing a large-scale, geometry-aware study of compute, data, and parameter scaling across unconstrained and equivariant architectures. It introduces two scaling laws: a compute-frontier L(C) with and a sum-power-law L(N,D)=L_\infty + A N^{-{\alpha}} + B D^{-{\beta}}\gamma_c\alpha\betaNDa,b \approx 0.5$ in the allocation. Additionally, enforcing symmetry via a loss term has limited impact on the compute-optimal frontier compared to actual equivariant architecture design, and the optimal depth grows with the degree of symmetry before saturating. These results have practical implications for scaling NNIPs and motivate developing scalable, high-order equivariant models for atomistic simulations.

Abstract

We present an empirical study in the geometric task of learning interatomic potentials, which shows equivariance matters even more at larger scales; we show a clear power-law scaling behaviour with respect to data, parameters and compute with ``architecture-dependent exponents''. In particular, we observe that equivariant architectures, which leverage task symmetry, scale better than non-equivariant models. Moreover, among equivariant architectures, higher-order representations translate to better scaling exponents. Our analysis also suggests that for compute-optimal training, the data and model sizes should scale in tandem regardless of the architecture. At a high level, these results suggest that, contrary to common belief, we should not leave it to the model to discover fundamental inductive biases such as symmetry, especially as we scale, because they change the inherent difficulty of the task and its scaling laws.

Paper Structure

This paper contains 45 sections, 21 equations, 9 figures, 7 tables.

Figures (9)

  • Figure 1: Performance of neural network interatomic potentials follows a power law (linear in log-log space) in training compute (PFLOPs, GPU-hours). The scaling behaviour varies with architectural complexity: the slope of the performance curve improves as the architecture changes from unconstrained to low-order to high-order, implying that performance gaps widen with increasing compute. Body order$\nu$: number of nodes whose states define a message within a layer. Tensor order$\ell$: order of geometric features processed by the models. Left: Empirical scaling laws along the FLOPs-optimal frontier. Right: Empirical scaling laws along the train-time-optimal frontier.
  • Figure 2: Estimation of $\kappa$ for architectures used in our study.
  • Figure 3: Pareto frontiers of training compute in log–log spaces. Top: Efficient loss-FLOPs frontier. Bottom: Efficient loss-train-time frontier. Across architectures, the log–log frontiers are approximately linear. Line color encodes model size (small, large).
  • Figure 4: Using higher orders of feature tensors in eSEN leads to better scaling exponents wrt compute.
  • Figure 5: Top: Scaling number of training tokens. Bottom: Scaling number of parameters.
  • ...and 4 more figures