Scaling Laws and Symmetry, Evidence from Neural Force Fields
Khang Ngo, Siamak Ravanbakhsh
TL;DR
The paper addresses how symmetry inductive biases affect scaling in neural interatomic potentials (NNIPs) by performing a large-scale, geometry-aware study of compute, data, and parameter scaling across unconstrained and equivariant architectures. It introduces two scaling laws: a compute-frontier L(C) with $L(C)=L_\infty + F_c C^{-{\gamma_c}}$ and a sum-power-law L(N,D)=L_\infty + A N^{-{\alpha}} + B D^{-{\beta}}$, with architecture-dependent exponents $\gamma_c$, $\alpha$, and $\beta$. The findings show that higher-order equivariant representations yield larger exponents, meaning the performance gap widens with scale, and that compute-optimal training favors increasing $N$ and $D$ in tandem, with $a,b \approx 0.5$ in the allocation. Additionally, enforcing symmetry via a loss term has limited impact on the compute-optimal frontier compared to actual equivariant architecture design, and the optimal depth grows with the degree of symmetry before saturating. These results have practical implications for scaling NNIPs and motivate developing scalable, high-order equivariant models for atomistic simulations.
Abstract
We present an empirical study in the geometric task of learning interatomic potentials, which shows equivariance matters even more at larger scales; we show a clear power-law scaling behaviour with respect to data, parameters and compute with ``architecture-dependent exponents''. In particular, we observe that equivariant architectures, which leverage task symmetry, scale better than non-equivariant models. Moreover, among equivariant architectures, higher-order representations translate to better scaling exponents. Our analysis also suggests that for compute-optimal training, the data and model sizes should scale in tandem regardless of the architecture. At a high level, these results suggest that, contrary to common belief, we should not leave it to the model to discover fundamental inductive biases such as symmetry, especially as we scale, because they change the inherent difficulty of the task and its scaling laws.
