Multi-task learning for molecular electronic structure approaching coupled-cluster accuracy
Hao Tang, Brian Xiao, Wenhao He, Pero Subasic, Avetik R. Harutyunyan, Yao Wang, Fang Liu, Haowei Xu, Ju Li
TL;DR
The paper introduces a multi-task equivariant graph neural network that predicts CCSD(T)-level electronic properties by augmenting a local DFT starting Hamiltonian ${\mathbf F}'$ with a learned correction ${\mathbf V}^{\theta}$ to form ${\mathbf H}^{\rm eff}$. Trained on CCSD(T) data for hydrocarbons, the model delivers high-accuracy predictions for energies, dipoles, quadrupoles, charges, bond orders, and excited-state properties such as the energy gap $E_g$ and polarizability $\alpha$, with perturbation-theory-based back-propagation ensuring stable gradients through the electronic eigenproblem. The approach demonstrates superior accuracy and data efficiency across in-domain and out-of-domain molecules, including aromatic systems and large semiconducting polymers, while offering substantial speed advantages over CCSD(T) and market-standard DFT functionals. This physics-informed, CCSD(T)-aware framework provides a scalable route to accurate electronic structure predictions for complex molecular systems and can be extended to broader multi-element datasets. The method’s integration of an effective Hamiltonian correction with multi-task learning offers a practical tool for computational chemistry, enabling CCSD(T)-level insights at near-linear scaling for materials design and molecular engineering.
Abstract
Machine learning (ML) plays an important role in quantum chemistry, providing fast-to-evaluate predictive models for various properties of molecules. However, most existing ML models for molecular electronic properties use density functional theory (DFT) databases as ground truth in training, and their prediction accuracy cannot surpass that of DFT. In this work, we developed a unified ML method for electronic structures of organic molecules using the gold-standard CCSD(T) calculations as training data. Tested on hydrocarbon molecules, our model outperforms DFT with the widely-used hybrid and double hybrid functionals in computational costs and prediction accuracy of various quantum chemical properties. As case studies, we apply the model to aromatic compounds and semiconducting polymers on both ground state and excited state properties, demonstrating its accuracy and generalization capability to complex systems that are hard to calculate using CCSD(T)-level methods.
