Table of Contents
Fetching ...

Machine-learning interatomic potentials achieving CCSD(T) accuracy for systems with extended covalent networks and van der Waals interactions

Yuji Ikeda, Axel Forslund, Pranav Kumar, Yongliang Ou, Jong Hyun Jung, Andreas Köhn, Blazej Grabowski

TL;DR

This work addresses the challenge of attaining CCSD(T)-level accuracy for materials with extended covalent networks and vdW interactions by developing a Δ-learning interatomic potential built on a dispersion-corrected tight-binding baseline. The authors train a Moment Tensor Potential to predict the CCSD(T)-F12 energy corrections on molecular fragments and combine it with GFN2-xTB baseline energies, enabling CCSD(T)-quality PESs for periodic systems like covalent-organic frameworks (COFs). The TB+$\Delta$MTP model achieves RMSEs around $<0.4$ meV/atom on training and test sets, and reproduces electronic total atomization energies, bond lengths, vibrational frequencies, and intermolecular energies with CCSD(T)-level accuracy, demonstrated on H2, C6H6, and the C48H30 COF, including inter-layer binding and H2 adsorption. This methodology provides a practical route for large-scale, CCSD(T)-accurate simulations and high-throughput CCSD(T) screening of vdW-dominated materials such as COFs.

Abstract

Machine-learning interatomic potentials (MLIPs) enable large-scale atomistic simulations at moderate computational cost while retaining ab initio accuracy. MLIPs trained on coupled-cluster data, particularly CCSD(T), have emerged a promising route to achieve chemical accuracy beyond the limits of density functional theory (DFT) and to incorporate non-empirical van der Waals (vdW) interactions. Most existing approaches are, however, still not straightforwardly applicable for systems with extended covalent networks such as covalent organic frameworks (COFs) due to the limited availability of CCSD(T) for periodic systems. Here we present a methodology to train MLIPs with CCSD(T) accuracy for these systems. The approach uses the Δ-learning method with a dispersion-corrected tight-binding baseline. This strategy enables training on compact molecular fragments while preserving transferability toward the periodic systems. Dispersion interactions are accounted for by adding vdW-bound multimers in the training set, and the combination with a vdW-aware tight-binding baseline allows the formally local MLIP to attain CCSD(T)-level accuracy even for systems dominated by long-range vdW forces. The resulting potential yields root-mean-square energy errors below 0.4 meV/atom on training and test sets and reproduces electronic total atomization energies, bond lengths, harmonic vibrational frequencies, and inter-molecular interaction energies for benchmark molecular systems. We apply the method to a prototypical quasi-two-dimensional COF composed of carbon and hydrogen. The COF structure, inter-layer binding energies, and hydrogen absorption are analyzed at CCSD(T) accuracy. The developed methodology opens a practical route to large-scale atomistic simulations for systems with extended covalent networks and vdW interactions with chemical accuracy.

Machine-learning interatomic potentials achieving CCSD(T) accuracy for systems with extended covalent networks and van der Waals interactions

TL;DR

This work addresses the challenge of attaining CCSD(T)-level accuracy for materials with extended covalent networks and vdW interactions by developing a Δ-learning interatomic potential built on a dispersion-corrected tight-binding baseline. The authors train a Moment Tensor Potential to predict the CCSD(T)-F12 energy corrections on molecular fragments and combine it with GFN2-xTB baseline energies, enabling CCSD(T)-quality PESs for periodic systems like covalent-organic frameworks (COFs). The TB+MTP model achieves RMSEs around meV/atom on training and test sets, and reproduces electronic total atomization energies, bond lengths, vibrational frequencies, and intermolecular energies with CCSD(T)-level accuracy, demonstrated on H2, C6H6, and the C48H30 COF, including inter-layer binding and H2 adsorption. This methodology provides a practical route for large-scale, CCSD(T)-accurate simulations and high-throughput CCSD(T) screening of vdW-dominated materials such as COFs.

Abstract

Machine-learning interatomic potentials (MLIPs) enable large-scale atomistic simulations at moderate computational cost while retaining ab initio accuracy. MLIPs trained on coupled-cluster data, particularly CCSD(T), have emerged a promising route to achieve chemical accuracy beyond the limits of density functional theory (DFT) and to incorporate non-empirical van der Waals (vdW) interactions. Most existing approaches are, however, still not straightforwardly applicable for systems with extended covalent networks such as covalent organic frameworks (COFs) due to the limited availability of CCSD(T) for periodic systems. Here we present a methodology to train MLIPs with CCSD(T) accuracy for these systems. The approach uses the Δ-learning method with a dispersion-corrected tight-binding baseline. This strategy enables training on compact molecular fragments while preserving transferability toward the periodic systems. Dispersion interactions are accounted for by adding vdW-bound multimers in the training set, and the combination with a vdW-aware tight-binding baseline allows the formally local MLIP to attain CCSD(T)-level accuracy even for systems dominated by long-range vdW forces. The resulting potential yields root-mean-square energy errors below 0.4 meV/atom on training and test sets and reproduces electronic total atomization energies, bond lengths, harmonic vibrational frequencies, and inter-molecular interaction energies for benchmark molecular systems. We apply the method to a prototypical quasi-two-dimensional COF composed of carbon and hydrogen. The COF structure, inter-layer binding energies, and hydrogen absorption are analyzed at CCSD(T) accuracy. The developed methodology opens a practical route to large-scale atomistic simulations for systems with extended covalent networks and vdW interactions with chemical accuracy.

Paper Structure

This paper contains 17 sections, 8 equations, 12 figures, 5 tables.

Figures (12)

  • Figure 1: (a) Single layer of the periodic C48H30 COF. (b) Monomer molecules considered for the training of MTPs. (c) Dihydrogen--benzene dimer and benzene trimer with the center-of-mass distances $d$ as examples of multimers considered also for the training. (d) Molecular systems used for testing the trained MTPs, thus not included in the training datasets. Visualization was performed using OVITO Stukowski_MSMSE_2010_Visualization.
  • Figure 2: (a) RMSEs of the energies per atom for the molecular systems in the training and the validation datasets predicted by $\Delta$MTPs. (b) RMSEs of the energies per atom for the molecular systems in the test dataset. (c) Local extrapolation grades of a selected snapshot from each molecular system in the test dataset obtained with $\Delta$MTPs at level 20.
  • Figure 3: RMSEs of cohesive energies per atom for the training datasets in GFN-FF Spicher_ACIE_59_15665_2020, GFN2-xTB Bannwarth_JCTC_2019_GFN2, PBE-D4 Perdew_PRL_77_3865_1996*Perdew_PRL_78_1396_1997Caldeweyher_JCP_2019_generally, as well as TB+$\Delta$MTP#5 at the MTP level 20 with respect to the reference PNO-LCCSD(T)-F12 values.
  • Figure 4: Equilibrium bond length () of dihydrogen (H2) with respect to the experimental value in Huber and Herzberg Huber_Book_1979_Molecular. The heavy-aug-cc-pVTZ basis set was used for quantum-chemical calculations. Note that H2 has only two electrons and therefore no triple excitations. The PNO-LCCSD(T)-F12 value was obtained by fitting the energies for H--H bond lengths on a 0.001 grid to an univariate second-order polynomial.
  • Figure 5: Equilibrium bond lengths () of benzene (C6H6) with respect to the experimental values of Heo et al.Heo_JPCL_2022_Mass and Esselman et al.Esselman_JACS_2023_Precise. The heavy-aug-cc-pVTZ basis set was used for quantum-chemical calculations. The PNO-LCCSD(T)-F12 values were obtained by fitting the energies for C--C and C--H bond lengths on a 0.001 grid to a bivariate second-order polynomial.
  • ...and 7 more figures