Table of Contents
Fetching ...

Efficient first-principles modeling of complex molecular crystals at sub-chemical accuracy

Benjamin X. Shi, Kristina M. Herman, Flaviano Della Pia, Venkat Kapil, Andrea Zen, Peter R. Nagy, Sotiris Xantheas, Angelos Michaelides

Abstract

Molecules can form myriad crystalline polymorphs, each with distinct properties affecting their performance across diverse applications, from pharmaceuticals to functional materials and more. Predicting the thermodynamically most stable polymorph from first principles remains a formidable challenge. It requires methods that scale to large, technologically-relevant molecules while achieving very high accuracy (below 1 kJ/mol) on relative lattice energies. Such accuracy, often termed sub-chemical accuracy, is generally beyond the reach of the workhorse density functional theory (DFT). In this work, we introduce a framework, combining advances in correlated wavefunction theory (cWFT) and the many-body expansion, to deliver accurate, cost-effective predictions of complex molecular crystals. For 23 organic molecules and 13 ice polymorphs, we predict crystal lattice energies to within experimental uncertainties at costs comparable to hybrid DFT, while being several orders of magnitude more efficient than previous cWFT approaches. We extend this approach to a set of large, drug-like molecules including axitinib and ROY, previously inaccessible to cWFT and where DFT is insufficient, achieving sub-chemical accuracy on the relative energies between challenging polymorphs. With the reference data generated throughout this work, we have been able to further parametrize a DFT functional with unprecedented accuracy aligning with our predictions. This cWFT framework as well as DFT functional are made openly available, providing new ranking tools to facilitate efficient high-throughput screening of molecular crystal polymorphs.

Efficient first-principles modeling of complex molecular crystals at sub-chemical accuracy

Abstract

Molecules can form myriad crystalline polymorphs, each with distinct properties affecting their performance across diverse applications, from pharmaceuticals to functional materials and more. Predicting the thermodynamically most stable polymorph from first principles remains a formidable challenge. It requires methods that scale to large, technologically-relevant molecules while achieving very high accuracy (below 1 kJ/mol) on relative lattice energies. Such accuracy, often termed sub-chemical accuracy, is generally beyond the reach of the workhorse density functional theory (DFT). In this work, we introduce a framework, combining advances in correlated wavefunction theory (cWFT) and the many-body expansion, to deliver accurate, cost-effective predictions of complex molecular crystals. For 23 organic molecules and 13 ice polymorphs, we predict crystal lattice energies to within experimental uncertainties at costs comparable to hybrid DFT, while being several orders of magnitude more efficient than previous cWFT approaches. We extend this approach to a set of large, drug-like molecules including axitinib and ROY, previously inaccessible to cWFT and where DFT is insufficient, achieving sub-chemical accuracy on the relative energies between challenging polymorphs. With the reference data generated throughout this work, we have been able to further parametrize a DFT functional with unprecedented accuracy aligning with our predictions. This cWFT framework as well as DFT functional are made openly available, providing new ranking tools to facilitate efficient high-throughput screening of molecular crystal polymorphs.
Paper Structure (15 sections, 5 figures)

This paper contains 15 sections, 5 figures.

Figures (5)

  • Figure 1: Reliable and efficient procedure for molecular crystal energetics. We employ a divide-and-conquer approach to reach 'gold-standard' CCSD(T) estimates of the lattice $E_\text{latt}$ and relative $E_\text{rel}$ energies (shown schematically in the top panel and defined in the Methods). As a post-Hartree-Fock (HF) method, the CCSD(T) $E_\text{latt}$ is composed of a (mean-field) HF and an (electron-electron) correlation contribution. We treat HF under periodic boundary conditions (PBC) while the correlation energy is treated with the many-body expansion (MBE). This division provides a natural description for each component, allowing us to leverage new developments in solid-state modeling under PBCs together with molecular quantum chemistry for the MBE.
  • Figure 2: Agreement across diverse molecular crystals. Comparison of the lattice energy predicted by LNO-MBE-CCSD(T) against previous DMC estimates dellapiaHowAccurateAre2024 as well as experimental measurements for the 23 organic molecular crystals in the X23 dataset. The full dataset is split into crystals held primarily by dispersion, hydrogen-bonding, or a mixture in the top, bottom and middle panels, respectively. The experimental sublimation enthalpies were converted to lattice energies using corrections computed with DFT-trained machine-learning interatomic potentials in Ref. dellapiaAccurateEfficientMachine2025 and expressed as a range based on the set of previous measurements (see Supplementary Section \ref{['si-sec:x23_exp_analysis']}).
  • Figure 3: Agreement across 13 polymorphs of ice. Comparison of the lattice energy predicted by LNO-MBE-CCSD(T) against previous DMC estimates dellapiaDMCICE13AmbientHigh2022b and experiments whalleyEnergiesPhasesIce1984 (when available) for the 13 ice polymorphs in the ICE13 dataset. The lattice energy for a range of widely-used density functional approximations [B3LYP-D4, revPBE0-D3, PBE0-MBD, SCAN+rVV10, optB86b-vdW, rev-vdW-DF2, vdW-DF, revPBE-D3, PBE-D3(BJ)] from Ref. dellapiaDMCICE13AmbientHigh2022b is illustrated in purple. We extend beyond standard small ferroelectric ordered cells ($6{-}16$ water molecules) to increasingly hydrogen-disordered antiferroelectric cells of medium ($12{-}16$ molecules) and large (${>}80$ molecules) sizes.
  • Figure 4: High accuracy benchmarks at affordable costs.a Comparison of the cost of LNO-MBE-CCSD(T) against periodic DFT, for both the generalized gradient approximation (GGA) and hybrids, as well as DMC. We report LNO-MBE-CCSD(T) results using low, moderate, and high settings defined by different two- and three-body cutoff distances, where the low and moderate settings have a mean absolute deviation (MAD) of 1 and $2\,$kJ/mol to the high settings, respectively. DMC costs are also reported for different levels of statistical sampling corresponding to 95% confidence intervals (2$\sigma$) of 1, 2, and $4\,$kJ/mol. The DMC costs neglect the costs to generate the trial wave-function and optimize the Jastrows while the CCSD(T) cost is only for the correlation energy. b We have computed the mean absolute deviation (MAD) to our LNO-MBE-CCSD(T) estimates for a wide set of density functional approximations to benchmark their performance on the ICE13 and X23 datasets. These benchmarks have been further used to reparameterize the XDM coefficients of the B86bPBE50 functional, yielding B86bPBE50-revXDM with unprecedented accuracy across both X23 and ICE13.
  • Figure 5: Sub-chemical accuracy on pharmaceutical-scale polymorph pairs. Comparison of the relative energy predicted by LNO-MBE-CCSD(T) against experiments (where the gray bar indicates uncertainties) for competing polymorphs of a ROY yuPolymorphismMolecularSolids2010b (Y and R), b rotigotine mortazaviComputationalPolymorphScreening2019 (II and I) and c axitinib campetaDevelopmentTargetedPolymorph2010a (XLI and VI). The experimentally observed form is given to the left, with a negative relative energy favoring its formation while a positive value favoring the (incorrect) metastable form. We also report the predictions for a set of commonly-employed density functional approximations and also the B86PBE50-revXDM parametrized in this work.