Projected Hessian Learning: Fast Curvature Supervision for Accurate Machine-Learning Interatomic Potentials

Austin Rodriguez; Justin S. Smith; Sakib Matin; Nicholas Lubbers; Kipton Barros; Jose L. Mendoza-Cortes

Projected Hessian Learning: Fast Curvature Supervision for Accurate Machine-Learning Interatomic Potentials

Austin Rodriguez, Justin S. Smith, Sakib Matin, Nicholas Lubbers, Kipton Barros, Jose L. Mendoza-Cortes

TL;DR

Projected Hessian Learning (PHL), a scalable second-order training framework that injects curvature information using only Hessian-vector products (HVPs), replaces explicit Hessian supervision with force-complexity curvature training, retaining most second-order accuracy gains while scaling to larger, more complex molecular systems.

Abstract

The Hessian matrix (second derivatives) encodes far richer local curvature of the potential energy surface than energies and forces alone. However, training machine-learning interatomic potentials (MLIPs) with full Hessians is often impractical because explicitly forming and storing Hessian matrices scales quadratically in cost and memory. We introduce Projected Hessian Learning (PHL), a scalable second-order training framework that injects curvature information using only Hessian-vector products (HVPs). Rather than constructing the Hessian, PHL projects curvature along stochastic probe directions and uses an unbiased stochastic trace-based loss with favorable system-size scaling, enabling curvature-informed training without quadratic memory growth. We benchmark PHL on a chemically diverse dataset of reactants, products, transition states, intrinsic reaction coordinates, and normal-mode sampled geometries computed at omegaB97XD/6-31G(d). We compare energy-force training (E-F), two HVP-based schemes (E-F-HVP with one-hot or randomized probes), and full energy-force-Hessian training (E-F-H). With randomized probes per minibatch, both HVP schemes match full-Hessian training in energy, force, and Hessian accuracy while delivering >24x epoch speedups for the small molecular systems studied. In a fixed-probe regime with one HVP per molecule, randomized projections consistently outperform one-column probing, especially for far-from-equilibrium geometries. Overall, PHL replaces explicit Hessian supervision with force-complexity curvature training, retaining most second-order accuracy gains while scaling to larger, more complex molecular systems.

Projected Hessian Learning: Fast Curvature Supervision for Accurate Machine-Learning Interatomic Potentials

TL;DR

Abstract

Paper Structure (27 sections, 30 equations, 11 figures, 2 tables, 1 algorithm)

This paper contains 27 sections, 30 equations, 11 figures, 2 tables, 1 algorithm.

Introduction
Mathematical Background
Stochastic estimation of the Hessian loss
Randomizing over mini-batches
Random vector distributions
Methodology
Dataset Preparation
Training Objective and Strategies
Loss Function Formulation
Results and Discussion
Validation Loss Convergence
Predictive Accuracy Across Datasets
Randomized HVP Vectors per Minibatch
Fixed HVP Vectors per System
Statistical Significance of RMSE Differences
...and 12 more sections

Figures (11)

Figure 1: Conceptual comparison of curvature supervision strategies used to train MLIPs. Left: Full Hessian training explicitly uses all $(3N)\times(3N)$ second-derivative elements of $M$ systems. Middle: One-column HVP probing samples a single Hessian column via a canonical basis vector. Right: Projected Hessian Learning (PHL) (this work). Curvature is enforced through stochastic projections using random probing vectors to form Hessian-vector products, yielding random weighted combinations of Hessian columns (Hutchinson-style probing). Both HVP-based strategies avoid explicit Hessian construction; PHL aggregates information across multiple curvature directions in expectation.
Figure 2: Validation loss curves for energy, force, and Hessian predictions under the fixed-vector approach. Each panel shows the RMSE of the validation set as a function of training epoch for energy and force training (E–F, red), energy, force, and Hessian–vector product training using the one-column method (E–F–HVP One-Column, orange), energy, force, and HVP training using the Hutchinson estimator (E–F–HVP PHL, purple), and full energy–force–Hessian training (E–F–H, blue).
Figure 3: Validation RMSE comparison across the Test Set, IRC dataset, and NMS dataset for models trained with different levels of information using (a) randomized probe vectors (resampled each minibatch) and (b) fixed probe vectors (one probe per system). Results are shown for energy and force training (E–F); energy, force, and Hessian–vector products (HVP) using the one-column method (E–F–HVP One-Column); energy, force, and HVP using the Hutchinson estimator (E–F–HVP PHL); and energy, force, and full Hessian training (E–F–H). From top to bottom, panels report RMSE for energies, forces, and Hessians. Bars represent mean RMSE values, with error bars indicating variability across ensembles of five independently trained models. Incorporating Hessian information improves accuracy relative to E-F across all datasets; under randomized probing, one-hot and PHL achieve statistically indistinguishable performance for the small molecular systems studied here, whereas under fixed probes PHL provides systematically lower errors, most notably for extrapolative NMS geometries.
Figure 4: Execution time per training epoch for different methods: energy and force training (E–F); energy, force, and Hessian–vector products (HVP) using the one-column method (E–F–HVP One-Column); energy, force, and HVP using the PHL method (E–F–HVP PHL); and energy, force, and full Hessian training (E–F–H). Bars show mean times with error bars indicating variability across epochs. Both one-column and PHL estimators achieve more than a 24-fold speedup compared to full Hessian training, while E–F training is the fastest but lacks Hessian information.
Figure 5: Scaling of CPU time with system size for density functional theory (DFT) calculations using Gaussian16 on a semi-log plot (a) and a log-log plot (b). Shown are the computational costs for evaluating total energies (green squares), forces (red triangles), and full Hessians (blue diamonds), and the estimated cost for evaluating Hessian–vector products (HVP, purple line). While force and energy costs scale comparably, the cost of full Hessian evaluations grows much more steeply with the number of atoms. In contrast, HVP evaluations provide a significantly cheaper alternative, with scaling closer to that of two force calculations, highlighting their efficiency for incorporating second-derivative information.
...and 6 more figures

Projected Hessian Learning: Fast Curvature Supervision for Accurate Machine-Learning Interatomic Potentials

TL;DR

Abstract

Projected Hessian Learning: Fast Curvature Supervision for Accurate Machine-Learning Interatomic Potentials

Authors

TL;DR

Abstract

Table of Contents

Figures (11)