Molecular-orbital-based Machine Learning for Open-shell and Multi-reference Systems with Kernel Addition Gaussian Process Regression
Lixue Cheng, Jiace Sun, J. Emiliano Deustua, Vignesh C. Bhethanabotla, Thomas F. Miller
TL;DR
This work introduces kernel addition Gaussian process regression (KA-GPR) within molecular-orbital-based ML (MOB-ML) to directly learn total correlation energies for general electronic structure theories, including open-shell and multi-reference systems. By leveraging Nesbet's theorem and ROHF-based MOB features, KA-GPR unifies predictions for closed- and open-shell cases and demonstrates state-of-the-art accuracy across criegee, H10, small radicals, water dissociation, QM9/QM7b-T/GDB-13-T benchmarks, and the QMSpin carbene dataset. Key findings show chemical accuracy achievable with modest training data, robust PES predictions, and strong transferability within similar molecular sizes, albeit with some loss in absolute-energy transfer when crossing dataset spaces. Overall, KA-GPR advances MOB-ML toward practical, high-accuracy predictions of electronic energies beyond CCSD(T) for diverse chemical systems, with significant implications for studying challenging open-shell reactions and radical chemistry.
Abstract
We introduce a novel machine learning strategy, kernel addition Gaussian process regression (KA-GPR), in molecular-orbital-based machine learning (MOB-ML) to learn the total correlation energies of general electronic structure theories for closed- and open-shell systems by introducing a machine learning strategy. The learning efficiency of MOB-ML (KA-GPR) is the same as the original MOB-ML method for the smallest criegee molecule, which is a closed-shell molecule with multi-reference characters. In addition, the prediction accuracies of different small free radicals could reach the chemical accuracy of 1 kcal/mol by training on one example structure. Accurate potential energy surfaces for the H10 chain (closed-shell) and water OH bond dissociation (open-shell) could also be generated by MOB-ML (KA-GPR). To explore the breadth of chemical systems that KA-GPR can describe, we further apply MOB-ML to accurately predict the large benchmark datasets for closed- (QM9, QM7b-T, GDB-13-T) and open-shell (QMSpin) molecules.
