$Δ$-ML Ensembles for Selecting Quantum Chemistry Methods to Compute Intermolecular Interactions

Austin M. Wallace; C. David Sherrill; Giri P. Krishnan

$Δ$-ML Ensembles for Selecting Quantum Chemistry Methods to Compute Intermolecular Interactions

Austin M. Wallace, C. David Sherrill, Giri P. Krishnan

TL;DR

Problem: selecting accurate yet affordable quantum chemistry methods for intermolecular interactions. Approach: a $Δ$-ML ensemble trained on AP-Net2 embeddings to predict the error $ΔE_{ m pred}$ between any level of theory and the reference $E_{ m IE,ref}$ (CCSD(T)/CBS/CP), supplemented by compute-time estimators to prune expensive options. Contributions: demonstration on BFDB-Ext with 80x80 level mappings achieving $MAE<0.1$ kcal/mol, with concrete corrections such as HF/aug-cc-pVDZ/CP from $2.89$ to $0.08$ kcal/mol and MP2/aug-cc-pVQZ/CP from $0.21$ to $0.02$, and dendrograms showing alignment with theoretical hierarchies; time-based filtering enables practical large-scale screening. Significance: enables data-driven, scalable selection of levels of theory for screening in materials and drug discovery.

Abstract

Ab initio quantum chemical methods for accurately computing interactions between molecules have a wide range of applications but are often computationally expensive. Hence, selecting an appropriate method based on accuracy and computational cost remains a significant challenge due to varying performance of methods. In this work, we propose a framework based on an ensemble of $Δ$-ML models trained on features extracted from a pre-trained atom-pairwise neural network to predict the error of each method relative to all other methods including the ``gold standard'' coupled cluster with single, double, and perturbative triple excitations at the estimated complete basis set limit [CCSD(T)/CBS]. Our proposed approach provides error estimates across various levels of theories and identifies the computationally efficient approach for a given error range utilizing only a subset of the dataset. Further, this approach allows comparison between various theories. We demonstrate the effectiveness of our approach using an extended BioFragment dataset, which includes the interaction energies for common biomolecular fragments and small organic dimers. Our results show that the proposed framework achieves very small mean-absolute-errors below 0.1 kcal/mol regardless of the given method. Furthermore, by analyzing all-to-all $Δ$-ML models for present levels of theory, we identify method groupings that align with theoretical hypotheses, providing evidence that $Δ$-ML models can easily learn corrections from any level of theory to any other level of theory.

$Δ$-ML Ensembles for Selecting Quantum Chemistry Methods to Compute Intermolecular Interactions

TL;DR

Problem: selecting accurate yet affordable quantum chemistry methods for intermolecular interactions. Approach: a

-ML ensemble trained on AP-Net2 embeddings to predict the error

between any level of theory and the reference

(CCSD(T)/CBS/CP), supplemented by compute-time estimators to prune expensive options. Contributions: demonstration on BFDB-Ext with 80x80 level mappings achieving

kcal/mol, with concrete corrections such as HF/aug-cc-pVDZ/CP from

kcal/mol and MP2/aug-cc-pVQZ/CP from

, and dendrograms showing alignment with theoretical hierarchies; time-based filtering enables practical large-scale screening. Significance: enables data-driven, scalable selection of levels of theory for screening in materials and drug discovery.

Abstract

-ML models trained on features extracted from a pre-trained atom-pairwise neural network to predict the error of each method relative to all other methods including the ``gold standard'' coupled cluster with single, double, and perturbative triple excitations at the estimated complete basis set limit [CCSD(T)/CBS]. Our proposed approach provides error estimates across various levels of theories and identifies the computationally efficient approach for a given error range utilizing only a subset of the dataset. Further, this approach allows comparison between various theories. We demonstrate the effectiveness of our approach using an extended BioFragment dataset, which includes the interaction energies for common biomolecular fragments and small organic dimers. Our results show that the proposed framework achieves very small mean-absolute-errors below 0.1 kcal/mol regardless of the given method. Furthermore, by analyzing all-to-all

-ML models for present levels of theory, we identify method groupings that align with theoretical hypotheses, providing evidence that

-ML models can easily learn corrections from any level of theory to any other level of theory.

$Δ$-ML Ensembles for Selecting Quantum Chemistry Methods to Compute Intermolecular Interactions

TL;DR

Abstract

$Δ$-ML Ensembles for Selecting Quantum Chemistry Methods to Compute Intermolecular Interactions

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (4)