Table of Contents
Fetching ...

Machine-Learned Leftmost Hessian Eigenvectors for Robust Transition State Finding

Guanchen Wu, Chung-Yueh Yuan, Kareem Hegazy, Samuel M. Blau, Teresa Head-Gordon

Abstract

The reliable determination of transition states (TSs) benefits from second-order information for robust convergence and validation, but the computational expense of Hessians prohibits their routine use in TS optimization. Here, we present a machine-learning-driven TS optimizer that directly predicts the leftmost Hessian eigenvector (LMHE), the critical mode that locally approximates the reaction coordinate encompassing the TS. We demonstrate that our LMHE optimizer recovers TS solutions at the same rate as full Hessian optimizers, and more robustly from degraded initial guess geometries, thereby eliminating the excessively long wall times characteristic of full-Hessian approaches and reducing total gradient evaluations compared to standard quasi-Newton methods. We further improve accuracy and robustness using uncertainty quantification for identifying occasional LMHE prediction failures, that then falls back to a full Hessian update from the machine learned potential at that optimization step, avoiding expensive active learning. Overall our methodology and semi-automated workflow delivers second-order stability at first-order computational expense to provide a highly efficient engine for high-throughput reaction discovery.

Machine-Learned Leftmost Hessian Eigenvectors for Robust Transition State Finding

Abstract

The reliable determination of transition states (TSs) benefits from second-order information for robust convergence and validation, but the computational expense of Hessians prohibits their routine use in TS optimization. Here, we present a machine-learning-driven TS optimizer that directly predicts the leftmost Hessian eigenvector (LMHE), the critical mode that locally approximates the reaction coordinate encompassing the TS. We demonstrate that our LMHE optimizer recovers TS solutions at the same rate as full Hessian optimizers, and more robustly from degraded initial guess geometries, thereby eliminating the excessively long wall times characteristic of full-Hessian approaches and reducing total gradient evaluations compared to standard quasi-Newton methods. We further improve accuracy and robustness using uncertainty quantification for identifying occasional LMHE prediction failures, that then falls back to a full Hessian update from the machine learned potential at that optimization step, avoiding expensive active learning. Overall our methodology and semi-automated workflow delivers second-order stability at first-order computational expense to provide a highly efficient engine for high-throughput reaction discovery.
Paper Structure (11 sections, 10 equations, 4 figures, 1 table)

This paper contains 11 sections, 10 equations, 4 figures, 1 table.

Figures (4)

  • Figure 1: GotenNet-GA model architecture used for transition state searches with the learned LMHE. (a) The macroscopic framework, comprising a GotenNet encoder, a global interaction module, and a readout. (b) Architecture of the Global Attention block, illustrating its internal mechanism and explicitly incorporating the encompassing residual connection shown in (a). (c) The Global Aggregation block which maps features from the physical graph nodes to the latent inducing points. (d) The Global Broadcast block, which propagates the updated global context from inducing points back to the physical nodes. In the figure, $+$ denotes addition, $\cdot$ denotes dot product, $\sum$ denotes aggregation with summation, $\mathcal{N}$ denotes the set of all real nodes in the graph, $\mathcal{P}$ denotes the set of all inducing points, and LN denotes layer normalization.
  • Figure 2: Comparison of predictions of different TS optimizers against 240 Sella benchmark reactions. The performance of the LMHE approach against the general QN baseline (TS-BFGS) and full Hessian TS optimization methods is quantified by comparing the predicted reaction paths obtained through IRC calculations against the exact reactant and product states defined by KinBot templates. Outcomes are classified as intended (the IRC connects the exact target states), partially intended (only one target state is matched), unintended (a valid TS is found, but connects to an alternative pathway), no reaction (optimization collapsed to a local minimum), or TS error (failure to converge). The single model LMHE strategy exhibits a higher failure rate due to inaccurate LMHE predictions. The LMHE ensemble consistency check identifies unreliable predictions, triggering a fallback to exact Hessian calculations and reducing failure rates to levels that are competitive with the performance of the full Hessian optimization, and significantly outperforming the standard QN baseline.
  • Figure 3: Comparison of the Robustness of TS Optimizers for 240 test combustion reactions. (a) The count of intended TS optimizations is plotted against the amplitude of random Gaussian noise applied to the initial geometries. The LMHE optimizer exhibits significantly more intended counts compared to the standard TS-BFGS QN method, maintaining high success rates comparable to the full Hessian approach even at high noise levels. (b) The number of TS optimization runs failing to converge is plotted against noise level. The single model LMHE strategy exhibits a higher failure rate due to inaccurate LMHE predictions. The LMHE ensemble consistency check identifies unreliable predictions, triggering a fallback to exact Hessian calculations and reducing failure rates to levels competitive with the full Hessian and QN baselines. Shaded regions represent the 95$\%$ confidence interval derived from 10 independent noise realizations.
  • Figure 4: Wall time and number of gradient evaluation comparisons between TS optimizers. (a) The distribution of wall times required for converged optimizations across all noise levels. While the standard QN method is the fastest, the full Hessian method suffers from a heavy tail due to the high cost of exact second-derivative calculations. The LMHE achieves a favorable balance, significantly reducing the computational cost relative to the full Hessian approach while providing the curvature information necessary for robust convergence compared to the QN baseline. (b) The distribution of total gradient calculations required for converged optimizations. While the full Hessian method requires the fewest evaluations due to exact curvature information, among the approximate methods both LMHE approaches require fewer total gradient evaluations than the standard QN method. To ensure comparability, the data includes only those optimization where all optimizers successfully recovered the intended TS from the same initial geometry.