Table of Contents
Fetching ...

Machine learning the two-electron reduced density matrix in molecules and condensed phases

Jessica A. Martinez B., Bhaskar Rana, Xuecheng Shao, Katarzyna Pernal, Michele Pavanello

TL;DR

This work develops surrogates for correlated wavefunction methods that yield 2-RDMs with sufficient fidelity to provide direct, training-free access to energies and forces for driving energy-conserving molecular dynamics.

Abstract

Machine learning is rapidly accelerating materials and chemical discovery, but most current models target energies, forces, or selected molecular properties rather than the underlying many-body electronic structure. Learning electronic-structure proxies, such as reduced density matrices, offers a path to surrogates that can predict a broad range of observables from a single ML model. Short of learning the full wavefunction, the two-electron reduced density matrix (2-RDM) is among the most information-rich, minimally lossy targets, providing direct access to expectation values of arbitrary one- and two-electron operators regardless of the strength of the underlying electron correlation. Here we show that learning the 2-RDM is a feasible goal, yielding exceptionally accurate models. We develop surrogates for correlated wavefunction methods (including configuration interaction and coupled cluster) that yield 2-RDMs with sufficient fidelity to provide direct, training-free access to energies and forces for driving energy-conserving molecular dynamics. To tackle realistic molecular condensed phases, we leverage a many-body expansion of the 2-RDM, using our ML models to supply the expansion terms and enabling ML-powered, coupled-cluster-quality electronic structure and energetics for large solvated systems. As a demonstration, we showcase a coupled-cluster-level electronic-structure calculation of glucose solvated by 500 water molecules achieved at Hartree-Fock cost. This work establishes a general framework for learning correlated electronic structure with high fidelity and deploying it to systems beyond the reach of conventional ab initio methods.

Machine learning the two-electron reduced density matrix in molecules and condensed phases

TL;DR

This work develops surrogates for correlated wavefunction methods that yield 2-RDMs with sufficient fidelity to provide direct, training-free access to energies and forces for driving energy-conserving molecular dynamics.

Abstract

Machine learning is rapidly accelerating materials and chemical discovery, but most current models target energies, forces, or selected molecular properties rather than the underlying many-body electronic structure. Learning electronic-structure proxies, such as reduced density matrices, offers a path to surrogates that can predict a broad range of observables from a single ML model. Short of learning the full wavefunction, the two-electron reduced density matrix (2-RDM) is among the most information-rich, minimally lossy targets, providing direct access to expectation values of arbitrary one- and two-electron operators regardless of the strength of the underlying electron correlation. Here we show that learning the 2-RDM is a feasible goal, yielding exceptionally accurate models. We develop surrogates for correlated wavefunction methods (including configuration interaction and coupled cluster) that yield 2-RDMs with sufficient fidelity to provide direct, training-free access to energies and forces for driving energy-conserving molecular dynamics. To tackle realistic molecular condensed phases, we leverage a many-body expansion of the 2-RDM, using our ML models to supply the expansion terms and enabling ML-powered, coupled-cluster-quality electronic structure and energetics for large solvated systems. As a demonstration, we showcase a coupled-cluster-level electronic-structure calculation of glucose solvated by 500 water molecules achieved at Hartree-Fock cost. This work establishes a general framework for learning correlated electronic structure with high fidelity and deploying it to systems beyond the reach of conventional ab initio methods.
Paper Structure (12 sections, 12 equations, 8 figures)

This paper contains 12 sections, 12 equations, 8 figures.

Figures (8)

  • Figure 1: Depiction of the workflows for the $\Gamma_{\mathrm{ML}}$, $\Gamma^c_{\mathrm{ML}}$ and $\Delta_{\mathrm{ML}}$ ML models.
  • Figure 2: PEC of a water molecule (FCI) as a function of the bond length of one of the OH bonds while the other is fixed at the equilibrium length. (a) Three ML models are compared. (b) the effect of purification on the 2-RDM predictions. In all panels, the distributions of training set configurations are indicated by the gray histograms. All models used the RBF kernel. For results generated with the linear kernel, see Figure \ref{['sup-eoh_lin']}.
  • Figure 3: Total and potential energies in kcal/mol along an NVE trajectory for ammonia (top: total energy over 10 ps; bottom: potential energy in the interval 5.0 ps to 5.1 ps), with initial velocities sampled from a Maxwell--Boltzmann distribution at $T=300$ K. The CCSD benchmark is shown in black. The 1-RDM-based ML models, $\gamma_{{\mathrm{ML}}}$ are shown in green ($N_{\mathrm{train}}=216$) and orange ($N_{\mathrm{train}}=1200$). The $\Delta_{{\mathrm{ML}}}$ model is shown as a dotted red line and $\Gamma^{c}_{{\mathrm{ML}}}$ as a dotted blue line (both trained with $N_{\mathrm{train}}=216$ structures). Although the potential energy is not conserved in the NVE ensemble, this short time window highlights differences in its fluctuations across models. The $\gamma_{{\mathrm{ML}}}$ model with $N_{\mathrm{train}}=216$ exhibits a noticeable drift. All other models closely track the CCSD benchmark.
  • Figure 4: Effect of molecular vibrations on the electronic structure factor of gas-phase ammonia, $S_e(q)$, where $q=|{\mathbf{q}}|$. We evaluate $S_e(q)$ using 2-RDMs predicted by the $\Delta_{\mathrm{ML}}$ model for 600 randomly sampled structures consistent with temperatures of 300 K and 700 K. Reference structure factors are taken from Ref. zotev_excited_2020. Top row: (a) total $S_e(q)$ (elastic + inelastic), (b) inelastic, and (c) elastic contributions. Bottom row: corresponding contributions to $S_e(q)$ from the correlated part of the 2-RDM.
  • Figure 5: Ethylene as an example of strong correlation. (a) PEC of ethylene vs C=C bond length; (b) 1-RDM occupations; (c) energy vs HCCH dihedral angle. In all panels, the distributions of training set configurations are indicated by the gray histograms. Vertical lines in panels (a) and (b) indicate the C=C distance of three geometries added to the training set to include in the model a notion of dissociation.
  • ...and 3 more figures