Table of Contents
Fetching ...

Bayesian Optimization with Gaussian Processes to Accelerate Stationary Point Searches

Rohit Goswami

TL;DR

A unified Bayesian Optimization view of minimization, single point saddle searches, and double ended saddle searches through a unified six-step surrogate loop, differing only in the inner optimization target and acquisition criterion is presented.

Abstract

Accelerating the explorations of stationary points on potential energy surfaces building local surrogates spans decades of effort. Done correctly, surrogates reduce required evaluations by an order of magnitude while preserving the accuracy of the underlying theory. We present a unified Bayesian Optimization view of minimization, single point saddle searches, and double ended saddle searches through a unified six-step surrogate loop, differing only in the inner optimization target and acquisition criterion. The framework uses Gaussian process regression with derivative observations, inverse-distance kernels, and active learning. The Optimal Transport GP extensions of farthest point sampling with Earth mover's distance, MAP regularization via variance barrier and oscillation detection, and adaptive trust radius form concrete extensions of the same basic methodology, improving accuracy and efficiency. We also demonstrate random Fourier features decouple hyperparameter training from predictions enabling favorable scaling for high-dimensional systems. Accompanying pedagogical Rust code demonstrates that all applications use the exact same Bayesian optimization loop, bridging the gap between theoretical formulation and practical execution.

Bayesian Optimization with Gaussian Processes to Accelerate Stationary Point Searches

TL;DR

A unified Bayesian Optimization view of minimization, single point saddle searches, and double ended saddle searches through a unified six-step surrogate loop, differing only in the inner optimization target and acquisition criterion is presented.

Abstract

Accelerating the explorations of stationary points on potential energy surfaces building local surrogates spans decades of effort. Done correctly, surrogates reduce required evaluations by an order of magnitude while preserving the accuracy of the underlying theory. We present a unified Bayesian Optimization view of minimization, single point saddle searches, and double ended saddle searches through a unified six-step surrogate loop, differing only in the inner optimization target and acquisition criterion. The framework uses Gaussian process regression with derivative observations, inverse-distance kernels, and active learning. The Optimal Transport GP extensions of farthest point sampling with Earth mover's distance, MAP regularization via variance barrier and oscillation detection, and adaptive trust radius form concrete extensions of the same basic methodology, improving accuracy and efficiency. We also demonstrate random Fourier features decouple hyperparameter training from predictions enabling favorable scaling for high-dimensional systems. Accompanying pedagogical Rust code demonstrates that all applications use the exact same Bayesian optimization loop, bridging the gap between theoretical formulation and practical execution.
Paper Structure (50 sections, 60 equations, 22 figures, 6 tables, 6 algorithms)

This paper contains 50 sections, 60 equations, 22 figures, 6 tables, 6 algorithms.

Figures (22)

  • Figure 1: Geometry of the dimer and Householder reflection. The dimer pair ($\mathbf{R}_1, \mathbf{R}_2$) straddles the midpoint $\mathbf{R}_0$ with axis $\hat{\mathbf{N}}$. The true force $\mathbf{F}$ (blue) is reflected about the hyperplane perpendicular to $\hat{\mathbf{N}}$, producing the modified force $\mathbf{F}^{\dagger}$ (coral) that climbs along the minimum mode while relaxing perpendicular to it.
  • Figure 2: GP conditioning in three panels. (Left) Before any data, the prior (Eq. \ref{['eq:gp_prior']}) admits a wide family of smooth functions. (Center) Oracle evaluations supply energies and forces at selected configurations. (Right) Conditioning on the data collapses the posterior near training points while preserving wide uncertainty elsewhere; the posterior mean serves as the surrogate surface $V_{\text{GP}}$.
  • Figure 3: GP surrogate fidelity as a function of training set size on the Muller-Brown surface. Each panel shows the GP posterior mean contours after training on $M = 5, 15, 30, 50$ Latin hypercube-sampled configurations (black dots). With 5 points the surrogate captures only crude basin structure; by 30 points the contours closely match the true PES (Figure \ref{['fig:mb_neb']}) in the sampled region.
  • Figure 4: GP predictive variance on the Muller-Brown surface after 20 training evaluations clustered near minimum A and saddle S1 (black dots). The variance is near zero close to training data and grows with distance, reaching a maximum (coral diamond) in the unexplored region. This variance landscape is the basis for active learning: the next electronic structure evaluation is placed where the GP is least certain.
  • Figure 5: Block structure of the full covariance matrix $K_{\text{full}}$. The base kernel in feature space generates four Cartesian-space blocks through differentiation via the feature Jacobian J. Darker shading indicates higher computational cost.
  • ...and 17 more figures