QH9: A Quantum Hamiltonian Prediction Benchmark for QM9 Molecules
Haiyang Yu, Meng Liu, Youzhi Luo, Alex Strasser, Xiaofeng Qian, Xiaoning Qian, Shuiwang Ji
TL;DR
The paper tackles the high-cost problem of predicting quantum Hamiltonians $\mathbf{H}$ by introducing QH9, a large-scale QM9-derived benchmark with 130{,}831 stable geometries and 999–2998 MD trajectories. It advocates SE(3)-equivariant quantum tensor networks (notably QHNet) to predict Hamiltonian blocks, defines four evaluation tasks (ID/OOD, geometry- and molecule-wise) and multiple metrics (MAE on $\mathbf{H}$, $\boldsymbol{\epsilon}$, cosine similarity of $\psi$, and DFT-acceleration ratios). Key contributions include the creation of a sizable Hamiltonian-matrix dataset, a structured benchmark with diverse test scenarios, and evidence that current matrix-equivariant models can predict $\mathbf{H}$ across many molecules while enabling faster DFT initialization. The work provides a valuable resource for developing Hamiltonian-prediction methods and advancing rapid molecular and materials design, with practical impact in accelerating electronic structure calculations and enabling scalable applications.
Abstract
Supervised machine learning approaches have been increasingly used in accelerating electronic structure prediction as surrogates of first-principle computational methods, such as density functional theory (DFT). While numerous quantum chemistry datasets focus on chemical properties and atomic forces, the ability to achieve accurate and efficient prediction of the Hamiltonian matrix is highly desired, as it is the most important and fundamental physical quantity that determines the quantum states of physical systems and chemical properties. In this work, we generate a new Quantum Hamiltonian dataset, named as QH9, to provide precise Hamiltonian matrices for 999 or 2998 molecular dynamics trajectories and 130,831 stable molecular geometries, based on the QM9 dataset. By designing benchmark tasks with various molecules, we show that current machine learning models have the capacity to predict Hamiltonian matrices for arbitrary molecules. Both the QH9 dataset and the baseline models are provided to the community through an open-source benchmark, which can be highly valuable for developing machine learning methods and accelerating molecular and materials design for scientific and technological applications. Our benchmark is publicly available at https://github.com/divelab/AIRS/tree/main/OpenDFT/QHBench.
