Table of Contents
Fetching ...

Broken neural scaling laws in materials science

Max Großmann, Malte Grunert, Erich Runge

TL;DR

This work analyzes how neural scaling laws apply to a data-limited materials science task: predicting the high-dimensional dielectric response of metals by learning the interband dielectric function $\overline{\varepsilon}_{\mathrm{inter}}(\omega)$ and Drude frequency $\overline{\omega}_{\mathrm{D}}$ from structure. Using a large high-throughput dataset of over $2\times 10^5$ intermetallics and invariant graph neural networks OptiMetal2B and OptiMetal3B, the authors quantify 1D and 2D NSLs across dataset size $D$ and parameter count $N$, revealing broken data scaling with a crossover $D_c$ and saturating parameter scaling beyond $N\sim 5\times 10^{6}$. A Kaplan-type 2D NSL map shows that data efficiency improves with higher body order, but the data-scaling regime persists, underscoring a data bottleneck in materials ML. The results motivate data-efficient architectures and strategies like transfer learning or multi-fidelity data to alleviate data-generation costs in materials discovery.

Abstract

In materials science, data are scarce and expensive to generate, whether computationally or experimentally. Therefore, it is crucial to identify how model performance scales with dataset size and model capacity to distinguish between data- and model-limited regimes. Neural scaling laws provide a framework for quantifying this behavior and guide the design of materials datasets and machine learning architectures. Here, we investigate neural scaling laws for a paradigmatic materials science task: predicting the dielectric function of metals, a high-dimensional response that governs how solids interact with light. Using over 200,000 dielectric functions from high-throughput ab initio calculations, we study two multi-objective graph neural networks trained to predict the frequency-dependent complex interband dielectric function and the Drude frequency. We observe broken neural scaling laws with respect to dataset size, whereas scaling with the number of model parameters follows a simple power law that rapidly saturates.

Broken neural scaling laws in materials science

TL;DR

This work analyzes how neural scaling laws apply to a data-limited materials science task: predicting the high-dimensional dielectric response of metals by learning the interband dielectric function and Drude frequency from structure. Using a large high-throughput dataset of over intermetallics and invariant graph neural networks OptiMetal2B and OptiMetal3B, the authors quantify 1D and 2D NSLs across dataset size and parameter count , revealing broken data scaling with a crossover and saturating parameter scaling beyond . A Kaplan-type 2D NSL map shows that data efficiency improves with higher body order, but the data-scaling regime persists, underscoring a data bottleneck in materials ML. The results motivate data-efficient architectures and strategies like transfer learning or multi-fidelity data to alleviate data-generation costs in materials discovery.

Abstract

In materials science, data are scarce and expensive to generate, whether computationally or experimentally. Therefore, it is crucial to identify how model performance scales with dataset size and model capacity to distinguish between data- and model-limited regimes. Neural scaling laws provide a framework for quantifying this behavior and guide the design of materials datasets and machine learning architectures. Here, we investigate neural scaling laws for a paradigmatic materials science task: predicting the dielectric function of metals, a high-dimensional response that governs how solids interact with light. Using over 200,000 dielectric functions from high-throughput ab initio calculations, we study two multi-objective graph neural networks trained to predict the frequency-dependent complex interband dielectric function and the Drude frequency. We observe broken neural scaling laws with respect to dataset size, whereas scaling with the number of model parameters follows a simple power law that rapidly saturates.
Paper Structure (5 sections, 12 equations, 2 figures, 2 tables)

This paper contains 5 sections, 12 equations, 2 figures, 2 tables.

Figures (2)

  • Figure 1: 1D NSLs for dataset and parameter scaling for OptiMetal2B and OptiMetal3B. The upper panel shows the validation loss $L_\mathrm{val}$ as a function of dataset size $D$ at fixed parameter count $N$ for OptiMetal2B and OptiMetal3B, and vice versa in the lower panel. Dots represent $L_\mathrm{val}$ averaged over three random model initializations, and error bars indicate the standard deviation. The dashed lines show the corresponding best NSL fits, whose functional form is given in the panel legends (see Methods for details). We investigate the scaling behavior of OptiMetal2B using CGC Xie2018 and TC Thekumparampil2018 message passing, as both yield nearly identical performance after the architecture optimization (see Supplementary Note 4). Fit parameters, including scaling exponents, are given in Tab. \ref{['tab:scaling_laws']}. In the top panel, we show two conventional NSLs (simple power laws) fitted to the first and last three data points of OptiMetal2B (CGC) (dotted black lines, annotated as "Asymptotic fit"), to highlight the "brokenness" of the neural scaling behavior.
  • Figure 2: 2D NSL maps for the TC-based OptiMetal2B and OptiMetal3B. The validation loss $L_\mathrm{val}$ is shown as a function of dataset size $D$ and parameter count $N$ for OptiMetal2B and OptiMetal3B. Dots represent $L_\mathrm{val}$ averaged over three random model initializations, and error bars indicate the standard deviation. The dashed lines show the corresponding best 2D NSL fits, whose functional form follows Eq. (\ref{['eq:kaplan_map']}) (see Methods for details). Each panel provides a different illustration of the scaling behavior: The upper panel shows the validation loss for models with different dataset sizes $D$ as a function of parameter count $N$, as indicated by the color bar. The middle panel swaps the roles of dataset size $D$ and parameter count $N$. The lower panel provides a full 2D representation of the NSLs, though it omits the fits for clarity. Fit parameters, including scaling exponents, are given in Tab. \ref{['tab:scaling_maps']}.