Table of Contents
Fetching ...

Are neural scaling laws leading quantum chemistry astray?

Siwoo Lee, Adji Bousso Dieng

TL;DR

This work tests whether neural scaling laws can yield physically faithful quantum-chemical predictions by scaling model capacity and data on large QM datasets and evaluating on the H$_2$ bond-dissociation curve. Despite improvements in predictive error with more data and larger models, the models fail to reproduce the H$_2$ energy curve and, strikingly, cannot learn basic Coulomb’s law, even when explicit inductive biases are present in foundation models. Foundation-model approaches may capture the H$_2$ curve but lack transferability to other diatomics, revealing a fundamental limitation of scaling-based strategies for learning electronic structure. The study argues that scaling alone does not guarantee physical generalization and calls for physics-informed, data-efficient methods to achieve reliable, extrapolatable quantum-chemical predictions. The results highlight a critical boundary for current large-scale ML in quantum chemistry and motivate alternative strategies beyond pure scaling.

Abstract

Neural scaling laws are driving the machine learning community toward training ever-larger foundation models across domains, assuring high accuracy and transferable representations for extrapolative tasks. We test this promise in quantum chemistry by scaling model capacity and training data from quantum chemical calculations. As a generalization task, we evaluate the resulting models' predictions of the bond dissociation energy of neutral H$_2$, the simplest possible molecule. We find that, regardless of dataset size or model capacity, models trained only on stable structures fail dramatically to even qualitatively reproduce the H$_2$ energy curve. Only when compressed and stretched geometries are explicitly included in training do the predictions roughly resemble the correct shape. Nonetheless, the largest foundation models trained on the largest and most diverse datasets containing dissociating diatomics exhibit serious failures on simple diatomic molecules. Most strikingly, they cannot reproduce the trivial repulsive energy curve of two bare protons, revealing their failure to learn the basic Coulomb's law involved in electronic structure theory. These results suggest that scaling alone is insufficient for building reliable quantum chemical models.

Are neural scaling laws leading quantum chemistry astray?

TL;DR

This work tests whether neural scaling laws can yield physically faithful quantum-chemical predictions by scaling model capacity and data on large QM datasets and evaluating on the H bond-dissociation curve. Despite improvements in predictive error with more data and larger models, the models fail to reproduce the H energy curve and, strikingly, cannot learn basic Coulomb’s law, even when explicit inductive biases are present in foundation models. Foundation-model approaches may capture the H curve but lack transferability to other diatomics, revealing a fundamental limitation of scaling-based strategies for learning electronic structure. The study argues that scaling alone does not guarantee physical generalization and calls for physics-informed, data-efficient methods to achieve reliable, extrapolatable quantum-chemical predictions. The results highlight a critical boundary for current large-scale ML in quantum chemistry and motivate alternative strategies beyond pure scaling.

Abstract

Neural scaling laws are driving the machine learning community toward training ever-larger foundation models across domains, assuring high accuracy and transferable representations for extrapolative tasks. We test this promise in quantum chemistry by scaling model capacity and training data from quantum chemical calculations. As a generalization task, we evaluate the resulting models' predictions of the bond dissociation energy of neutral H, the simplest possible molecule. We find that, regardless of dataset size or model capacity, models trained only on stable structures fail dramatically to even qualitatively reproduce the H energy curve. Only when compressed and stretched geometries are explicitly included in training do the predictions roughly resemble the correct shape. Nonetheless, the largest foundation models trained on the largest and most diverse datasets containing dissociating diatomics exhibit serious failures on simple diatomic molecules. Most strikingly, they cannot reproduce the trivial repulsive energy curve of two bare protons, revealing their failure to learn the basic Coulomb's law involved in electronic structure theory. These results suggest that scaling alone is insufficient for building reliable quantum chemical models.

Paper Structure

This paper contains 17 sections, 6 equations, 6 figures.

Figures (6)

  • Figure 1: Prediction errors on test set splits of SchNet models of varying model capacity, $c$, trained on different numbers of training samples from (left) GDB-9-G4(MP2) and (right) VQM24 datasets. The mean absolute error is plotted against the number of training samples on a log-log plot. Horizontal dashed line denotes baseline accuracy of 1 kcal/mol. Larger model capacities generally reduce errors and more training data systematically reduce error.
  • Figure 2: Neutral H2 bond dissociation energy curves (left column), equilibrium bond lengths (middle column; black dashed line indicates reference CCSD(T) and KSDFT values of 0.77 Å), and dissociation energies (right column; dashed dark and light lines indicate reference CCSD(T) and KSDFT values of 109 and 207 kcal/mol, respectively) obtained from CCSD(T) (cc-pVQZ basis), $\omega$B97M-V/cc-pVQZ KSDFT, and SchNet models of various $c$ trained on (top) 100k GDB-9-G4(MP2) samples and on (bottom) 500k VQM24 samples. Models trained on larger and more diverse datasets yield results that more closely match CCSD(T) and KSDFT references.
  • Figure 3: Distributions of smallest inter-atomic distance found in each sample in OMol25, VQM24, and GDB-9-G4(MP2) datasets. GDB-9-G4(MP2) is least diverse (0.96--1.33 Å), followed by VQM24 (0.75--2.35 Å), then OMol25 (0.40--7.50 Å).
  • Figure 4: Bond dissociation energy curves of neutral H2, Li2, Be2, N2, O2, F2, calculated using $\omega$B97M-V/def2-TZVPD KSDFT and estimated using UMA-S-1.1, UMA-M-1.1, OMol25 eSEN-sm-cons., Orb v3 conservative OMol25, and AIMNet2. AIMNet2 was only applied to H2, N2, O2, and F2 because it does not support elements of the other diatomics. The models generally perform well for H2 but exhibit serious qualitative failures for the other systems.
  • Figure 5: Distribution of (left) elements (denoted by nuclear charge, $\mathcal{Z}$) involved in the 61,498 diatomic systems found in OMol25 dataset and (right) corresponding distribution of bond lengths. 38 elements are found with bond lengths ranging 0.62--7.50 Å.
  • ...and 1 more figures