Table of Contents
Fetching ...

A large language model-type architecture for high-dimensional molecular potential energy surfaces

Xiao Zhu, Srinivasan S. Iyengar

TL;DR

The paper tackles the challenge of constructing accurate high-dimensional potential energy surfaces (PES) for large molecular systems at CCSD-level accuracy. It introduces a graph-theoretic fragmentation framework in which a molecule is partitioned into rank-$r$ simplexes and the full energy is reconstructed as a weighted sum of fragment corrections, $E^{target}(ar{x}) = E^{Ref}(ar{x}) + \sum_{r=0}^{\mathcal{R}} \sum_{\alpha_r \in {\bf V}_r} {\cal M}_{\alpha_r,r}^{\mathcal{R}} \Delta E_{\alpha_r,r}(\bar{x}_{\alpha_r,r})$, enabling a family of neural networks with drastically reduced complexity. The authors validate the approach on solvated Zundel ($\mathcal{D}=51$) and extend it to the protonated 21-water cluster ($\mathcal{D}=186$), achieving sub-kcal/mol accuracy for full PES and approaching CCSD-level accuracy through incremental transfer learning and an attention-like platform. By exploiting fragment-wise learning, geometric tessellation, and slice-by-slice domain expansion, the method attains CCSD-quality PES for a large hydrated system with substantially lower data and computational cost than direct high-level calculations, offering a scalable route to full-dimensional PES in complex hydrogen-bonded networks.

Abstract

Computing high-dimensional potential energy surfaces for molecular systems and materials is considered to be a great challenge in computational chemistry with potential impact in a range of areas including the fundamental prediction of reaction rates. In this paper, we design and discuss an algorithm that has similarities to large language models in generative AI and natural language processing. Specifically, we represent a molecular system as a graph which contains a set of nodes, edges, faces, etc. Interactions between these sets, which represent molecular subsystems in our case, are used to construct the potential energy surface for a reasonably sized chemical system with 51 nuclear dimensions. For this purpose, a family of neural networks that pertain to the graph-theoretically obtained subsystems get the job done for this 51 nuclear dimensional system. We then ask if this same family of lower-dimensional graph-based neural networks can be transformed to provide accurate predictions for a 186-dimensional potential energy surface. We find that our algorithm does provide accurate results for this larger-dimensional problem with sub-kcal/mol accuracy for the higher-dimensional potential energy surface problem. Indeed, as a result of these developments, here we produce the first efforts towards a full-dimensional potential energy surface for the protonated 21-water cluster (186 nuclear dimensions) at CCSD level accuracy.

A large language model-type architecture for high-dimensional molecular potential energy surfaces

TL;DR

The paper tackles the challenge of constructing accurate high-dimensional potential energy surfaces (PES) for large molecular systems at CCSD-level accuracy. It introduces a graph-theoretic fragmentation framework in which a molecule is partitioned into rank- simplexes and the full energy is reconstructed as a weighted sum of fragment corrections, , enabling a family of neural networks with drastically reduced complexity. The authors validate the approach on solvated Zundel () and extend it to the protonated 21-water cluster (), achieving sub-kcal/mol accuracy for full PES and approaching CCSD-level accuracy through incremental transfer learning and an attention-like platform. By exploiting fragment-wise learning, geometric tessellation, and slice-by-slice domain expansion, the method attains CCSD-quality PES for a large hydrated system with substantially lower data and computational cost than direct high-level calculations, offering a scalable route to full-dimensional PES in complex hydrogen-bonded networks.

Abstract

Computing high-dimensional potential energy surfaces for molecular systems and materials is considered to be a great challenge in computational chemistry with potential impact in a range of areas including the fundamental prediction of reaction rates. In this paper, we design and discuss an algorithm that has similarities to large language models in generative AI and natural language processing. Specifically, we represent a molecular system as a graph which contains a set of nodes, edges, faces, etc. Interactions between these sets, which represent molecular subsystems in our case, are used to construct the potential energy surface for a reasonably sized chemical system with 51 nuclear dimensions. For this purpose, a family of neural networks that pertain to the graph-theoretically obtained subsystems get the job done for this 51 nuclear dimensional system. We then ask if this same family of lower-dimensional graph-based neural networks can be transformed to provide accurate predictions for a 186-dimensional potential energy surface. We find that our algorithm does provide accurate results for this larger-dimensional problem with sub-kcal/mol accuracy for the higher-dimensional potential energy surface problem. Indeed, as a result of these developments, here we produce the first efforts towards a full-dimensional potential energy surface for the protonated 21-water cluster (186 nuclear dimensions) at CCSD level accuracy.

Paper Structure

This paper contains 22 sections, 38 equations, 36 figures, 4 tables.

Figures (36)

  • Figure 1: The density of gray edges represents the number of network weights that need to be computed to obtain a representation of the potential energy surface.
  • Figure 2: Visual illustration of neural networks used to compute $\left\{ \Delta E_{\alpha,r}^{ML}({\bf {\bar{x}}}) \right\}$ for $r=0$. See Eq. (\ref{['eq_graph-ML-main']}).
  • Figure 3: Visual illustration of neural networks used to compute $\left\{ \Delta E_{\alpha,r}^{ML}({\bf {\bar{x}}}) \right\}$ for $r=1$. See Eq. (\ref{['eq_graph-ML-main']}).
  • Figure 4: Illustration of sets and graphs described in Section \ref{['sec_graph']}. The water wire on top is represented as a graph, with nodes being individual water molecules, and the graph also defines the sets $A$, $B$ and $C$. See Figure \ref{['fig_graph_complex']} for a graphical representation of systems studied in this paper.
  • Figure 5: The graphical complexity for the two systems.
  • ...and 31 more figures