Table of Contents
Fetching ...

Accurate molecular polarizabilities with coupled-cluster theory and machine learning

David M. Wilkins, Andrea Grisafi, Yang Yang, Ka Un Lao, Robert A. DiStasio, Michele Ceriotti

TL;DR

The paper addresses the accurate prediction of molecular polarizabilities $\boldsymbol{\alpha}$, a tensor governing induction and dispersion, which is challenging for standard electronic-structure methods. They benchmark LR-CCSD polarizabilities on the QM7b dataset using $d$-aug-cc-pVDZ and compare to DFT, then introduce ALPHA-ML, a symmetry-adapted Gaussian process regression model based on $\lambda$-SOAP descriptors to predict the full tensor $\boldsymbol{\alpha}$ with LR-CCSD-level accuracy at a fraction of the cost. Delta-learning from a DFT baseline and an atom-centered decomposition provide both performance gains and interpretability, with near-CCSD accuracy on validation and successful extrapolation to 52 larger molecules. This approach offers a scalable route to accurate polarizable force fields and spectroscopy-informed predictions in large systems.

Abstract

The molecular polarizability describes the tendency of a molecule to deform or polarize in response to an applied electric field. As such, this quantity governs key intra- and inter-molecular interactions such as induction and dispersion, plays a key role in determining the spectroscopic signatures of molecules, and is an essential ingredient in polarizable force fields and other empirical models for collective interactions. Compared to other ground-state properties, an accurate and reliable prediction of the molecular polarizability is considerably more difficult as this response quantity is quite sensitive to the description of the underlying molecular electronic structure. In this work, we present state-of-the-art quantum mechanical calculations of the static dipole polarizability tensors of 7,211 small organic molecules computed using linear-response coupled-cluster singles and doubles theory (LR-CCSD). Using a symmetry-adapted machine-learning based approach, we demonstrate that it is possible to predict the molecular polarizability with LR-CCSD accuracy at a negligible computational cost. The employed model is quite robust and transferable, yielding molecular polarizabilities for a diverse set of 52 larger molecules (which includes challenging conjugated systems, carbohydrates, small drugs, amino acids, nucleobases, and hydrocarbon isomers) at an accuracy that exceeds that of hybrid density functional theory (DFT). The atom-centered decomposition implicit in our machine-learning approach offers some insight into the shortcomings of DFT in the prediction of this fundamental quantity of interest.

Accurate molecular polarizabilities with coupled-cluster theory and machine learning

TL;DR

The paper addresses the accurate prediction of molecular polarizabilities , a tensor governing induction and dispersion, which is challenging for standard electronic-structure methods. They benchmark LR-CCSD polarizabilities on the QM7b dataset using -aug-cc-pVDZ and compare to DFT, then introduce ALPHA-ML, a symmetry-adapted Gaussian process regression model based on -SOAP descriptors to predict the full tensor with LR-CCSD-level accuracy at a fraction of the cost. Delta-learning from a DFT baseline and an atom-centered decomposition provide both performance gains and interpretability, with near-CCSD accuracy on validation and successful extrapolation to 52 larger molecules. This approach offers a scalable route to accurate polarizable force fields and spectroscopy-informed predictions in large systems.

Abstract

The molecular polarizability describes the tendency of a molecule to deform or polarize in response to an applied electric field. As such, this quantity governs key intra- and inter-molecular interactions such as induction and dispersion, plays a key role in determining the spectroscopic signatures of molecules, and is an essential ingredient in polarizable force fields and other empirical models for collective interactions. Compared to other ground-state properties, an accurate and reliable prediction of the molecular polarizability is considerably more difficult as this response quantity is quite sensitive to the description of the underlying molecular electronic structure. In this work, we present state-of-the-art quantum mechanical calculations of the static dipole polarizability tensors of 7,211 small organic molecules computed using linear-response coupled-cluster singles and doubles theory (LR-CCSD). Using a symmetry-adapted machine-learning based approach, we demonstrate that it is possible to predict the molecular polarizability with LR-CCSD accuracy at a negligible computational cost. The employed model is quite robust and transferable, yielding molecular polarizabilities for a diverse set of 52 larger molecules (which includes challenging conjugated systems, carbohydrates, small drugs, amino acids, nucleobases, and hydrocarbon isomers) at an accuracy that exceeds that of hybrid density functional theory (DFT). The atom-centered decomposition implicit in our machine-learning approach offers some insight into the shortcomings of DFT in the prediction of this fundamental quantity of interest.

Paper Structure

This paper contains 2 sections, 3 equations, 5 figures, 1 table.

Table of Contents

  1. Introduction
  2. Methods

Figures (5)

  • Figure 1: Learning curves for the per-atom polarizabilities of the QM7b molecules, calculated using either CCSD or DFT, as well as for the difference ($\Delta$) between the two. The testing set consists of 1811 molecules, and the right-hand axis shows the RMSE as a fraction of the intrinsic variability of the CCSD polarizability, $\sigma_\text{CCSD}$.
  • Figure 2: List of molecules included in the showcase dataset. Numbers refer to the position in the dataset and are used for reference in other figures.
  • Figure 3: Error made in approximating the $\lambda=0$ (bottom panel) and $\lambda=2$ (top panel) components of the average polarizability per atom for the 52 showcase molecules, as a function of the molecule indices in Fig. \ref{['fig:all-show']}. Vertical lines show the partitioning of these molecules into different groups. Red squares show the machine-learning error, blue circles the error made in using the DFT polarizability to approximate the CCSD polarizability, and black crosses the error when $\Delta$-learning of the correction to the DFT polarizability is used.
  • Figure 4: Predicted atomic contributions to the total CCSD polarizability tensor for a selection of molecules in the showcase set. The ellipsoids are aligned along the principal axes of the atomic polarizability, and their extent is proportional to the square root of the corresponding eigenvalue of $\boldsymbol{\alpha}_i$. The ellipsoids have dimensions that are proportional to the modulus of the square root of eigenvalues of $\Delta\boldsymbol{\alpha}_i$. The principal axes are shown, and are colored based on whether the corresponding eigenvalues are positive (black) or negative(red) See also the figure key, not to scale.
  • Figure 5: Top: distributions of the predicted atomic contribution to the $\lambda=0$ component of the difference between DFT and CCSD polarizability. Bottom: example decompositions of the polarizability difference. The ellipsoids represent the magnitude and principal axes of $\Delta\boldsymbol{\alpha}_i$. Black axes indicate that DFT polarizability is larger than CCSD, red axes that DFT polarizability is smaller. See figure key in Fig. \ref{['fig:anisotropies_showcase']}.