A large language model-type architecture for high-dimensional molecular potential energy surfaces
Xiao Zhu, Srinivasan S. Iyengar
TL;DR
The paper tackles the challenge of constructing accurate high-dimensional potential energy surfaces (PES) for large molecular systems at CCSD-level accuracy. It introduces a graph-theoretic fragmentation framework in which a molecule is partitioned into rank-$r$ simplexes and the full energy is reconstructed as a weighted sum of fragment corrections, $E^{target}(ar{x}) = E^{Ref}(ar{x}) + \sum_{r=0}^{\mathcal{R}} \sum_{\alpha_r \in {\bf V}_r} {\cal M}_{\alpha_r,r}^{\mathcal{R}} \Delta E_{\alpha_r,r}(\bar{x}_{\alpha_r,r})$, enabling a family of neural networks with drastically reduced complexity. The authors validate the approach on solvated Zundel ($\mathcal{D}=51$) and extend it to the protonated 21-water cluster ($\mathcal{D}=186$), achieving sub-kcal/mol accuracy for full PES and approaching CCSD-level accuracy through incremental transfer learning and an attention-like platform. By exploiting fragment-wise learning, geometric tessellation, and slice-by-slice domain expansion, the method attains CCSD-quality PES for a large hydrated system with substantially lower data and computational cost than direct high-level calculations, offering a scalable route to full-dimensional PES in complex hydrogen-bonded networks.
Abstract
Computing high-dimensional potential energy surfaces for molecular systems and materials is considered to be a great challenge in computational chemistry with potential impact in a range of areas including the fundamental prediction of reaction rates. In this paper, we design and discuss an algorithm that has similarities to large language models in generative AI and natural language processing. Specifically, we represent a molecular system as a graph which contains a set of nodes, edges, faces, etc. Interactions between these sets, which represent molecular subsystems in our case, are used to construct the potential energy surface for a reasonably sized chemical system with 51 nuclear dimensions. For this purpose, a family of neural networks that pertain to the graph-theoretically obtained subsystems get the job done for this 51 nuclear dimensional system. We then ask if this same family of lower-dimensional graph-based neural networks can be transformed to provide accurate predictions for a 186-dimensional potential energy surface. We find that our algorithm does provide accurate results for this larger-dimensional problem with sub-kcal/mol accuracy for the higher-dimensional potential energy surface problem. Indeed, as a result of these developments, here we produce the first efforts towards a full-dimensional potential energy surface for the protonated 21-water cluster (186 nuclear dimensions) at CCSD level accuracy.
