Table of Contents
Fetching ...

Beyond Adam: Disentangling Optimizer Effects in the Fine-Tuning of Atomistic Foundation Models

Xiaoqing Liu, Yangshuai Wang, Teng Zhao

TL;DR

This work investigates how optimizer choice shapes the fine-tuning of atomistic foundation models, introducing a preconditioning framework to interpret the dynamics of gradient-based updates. Using MACE-based backbone models across inorganic, organic, and liquid regimes, it benchmarks seven first-order optimizers and a brief second-order refinement, linking optimization dynamics to energy-force fidelity and downstream properties like elastic moduli and phonons. The study finds that AdamW and ScheduleFree consistently deliver superior curvature conditioning and smoother potentials compared to Adam, with SGD performing poorly, while L-BFGS refinements provide targeted gains in heterogeneous or interfacial landscapes. These findings offer practical guidelines for selecting and designing optimizers in universal interatomic potentials, highlighting the value of curvature-aware strategies for stable, accurate fine-tuning and reliable multiscale dynamics.

Abstract

Atomistic foundation models constitute a paradigm shift in computational materials science by providing universal machine-learned interatomic potentials with broad transferability across chemical spaces. Although fine-tuning is essential for adapting these pretrained models to specific target systems, the influence of the optimization algorithm on this process remains insufficiently characterized. In this work, we perform a rigorous benchmark of seven first-order optimizers, including Adam, AdamW, RAdam, SGD, LAMB, Ranger, and ScheduleFree, for the fine-tuning of foundation models across molecular, crystalline, and liquid regimes. We evaluate these algorithms based on energy and force accuracy for both in-distribution and out-of-distribution configurations, as well as their impact on downstream physical properties such as elastic moduli, phonon spectra, and interfacial dynamics. We interpret these empirical results through a preconditioning framework that views each optimizer as a data-dependent linear transformation of the gradient. This analysis clarifies how different update rules impose specific spectral filters on the effective loss Hessian. Across all regimes, AdamW and ScheduleFree achieve superior curvature conditioning and force accuracy, whereas stochastic gradient descent exhibits slow convergence and instability. Furthermore, we demonstrate that a brief second-order refinement stage reduces residual anisotropy in the loss landscape and enhances the fidelity of physical observables without increasing inference costs. These findings provide conceptual insight and practical guidance for selecting and designing optimizers to ensure the stable and efficient fine-tuning of universal interatomic potentials.

Beyond Adam: Disentangling Optimizer Effects in the Fine-Tuning of Atomistic Foundation Models

TL;DR

This work investigates how optimizer choice shapes the fine-tuning of atomistic foundation models, introducing a preconditioning framework to interpret the dynamics of gradient-based updates. Using MACE-based backbone models across inorganic, organic, and liquid regimes, it benchmarks seven first-order optimizers and a brief second-order refinement, linking optimization dynamics to energy-force fidelity and downstream properties like elastic moduli and phonons. The study finds that AdamW and ScheduleFree consistently deliver superior curvature conditioning and smoother potentials compared to Adam, with SGD performing poorly, while L-BFGS refinements provide targeted gains in heterogeneous or interfacial landscapes. These findings offer practical guidelines for selecting and designing optimizers in universal interatomic potentials, highlighting the value of curvature-aware strategies for stable, accurate fine-tuning and reliable multiscale dynamics.

Abstract

Atomistic foundation models constitute a paradigm shift in computational materials science by providing universal machine-learned interatomic potentials with broad transferability across chemical spaces. Although fine-tuning is essential for adapting these pretrained models to specific target systems, the influence of the optimization algorithm on this process remains insufficiently characterized. In this work, we perform a rigorous benchmark of seven first-order optimizers, including Adam, AdamW, RAdam, SGD, LAMB, Ranger, and ScheduleFree, for the fine-tuning of foundation models across molecular, crystalline, and liquid regimes. We evaluate these algorithms based on energy and force accuracy for both in-distribution and out-of-distribution configurations, as well as their impact on downstream physical properties such as elastic moduli, phonon spectra, and interfacial dynamics. We interpret these empirical results through a preconditioning framework that views each optimizer as a data-dependent linear transformation of the gradient. This analysis clarifies how different update rules impose specific spectral filters on the effective loss Hessian. Across all regimes, AdamW and ScheduleFree achieve superior curvature conditioning and force accuracy, whereas stochastic gradient descent exhibits slow convergence and instability. Furthermore, we demonstrate that a brief second-order refinement stage reduces residual anisotropy in the loss landscape and enhances the fidelity of physical observables without increasing inference costs. These findings provide conceptual insight and practical guidance for selecting and designing optimizers to ensure the stable and efficient fine-tuning of universal interatomic potentials.

Paper Structure

This paper contains 31 sections, 19 equations, 6 figures, 4 tables, 1 algorithm.

Figures (6)

  • Figure 1: Schematic illustration of the study design. A pretrained MACE-based foundation model is fine-tuned on inorganic, molecular, and liquid benchmarks using various first-order optimizers, optionally followed by an L-BFGS refinement stage.
  • Figure 2: Force RMSE relative improvement compared with Adam across four test conditions. Negative values indicate worse performance than Adam.
  • Figure 3: Static mechanical properties and GSFE for BCC Mo. (a) Atomic structure of Mo with a dislocation, where the defect core is highlighted in red. (b) Relative errors of lattice constant, elastic constants $C_{11}$, $C_{12}$, $C_{44}$, bulk modulus $B$, and Poisson ratio $\nu$ for Adam, AdamW, and ScheduleFree, with and without L-BFGS refinement. (c) GSFE along the $\langle 121\rangle$ slip path compared with DFT. (d) GSFE RMSE.
  • Figure 4: Phonon dispersions of monolayer MoS$_2$. (a) Relaxed MoS$_2$ monolayer. (b) First Brillouin zone and high symmetry path $\Gamma\text{--}M\text{--}K\text{--}\Gamma$. (c) Phonon spectra (top) and pointwise mean absolute error (MAE, bottom) along $\Gamma\text{--}M\text{--}K\text{--}\Gamma$ for models trained with different first-order optimizers without refinement. Dashed lines denote the DFT reference. (d) Same quantities after the L-BFGS refinement.
  • Figure 5: Comparison of MACE-MD simulations for water on graphene trained with different optimization strategies in the NVT ensemble at 300 K. (a) Temperature evolution over a 100 ps trajectory. The model trained with Adam followed by L-BFGS (red) exhibits significant thermal fluctuations despite the presence of a thermostat. (b) RDF of oxygen-oxygen pairs ($g_{\text{OO}}(r)$). (c) MSD of oxygen atoms with the self-diffusion coefficient ($D$) indicated. (d) Probability density distribution of the H-O-H bond angle ($\theta_{\text{H-O-H}}$).
  • ...and 1 more figures