Table of Contents
Fetching ...

High-order Equivariant Flow Matching for Density Functional Theory Hamiltonian Prediction

Seongsu Kim, Nayoung Kim, Dongwoo Kim, Sungsoo Ahn

TL;DR

QHFlow reframes Hamiltonian prediction for KS-DFT as learning a distribution over Hamiltonians conditioned on molecular geometry, using high-order SE(3)-equivariant flow matching to capture the structured, blockwise symmetry of RH-DFT Hamiltonians. It introduces symmetry-aware priors (GOE and tensor expansion) and a post hoc energy-alignment fine-tuning to ensure physically faithful orbital energies, achieving state-of-the-art Hamiltonian MAEs on MD17 and QH9 and enabling substantial acceleration of SCF convergence when used to initialize DFT calculations. The approach combines a CNF trajectory with SE(3)-equivariant vector fields and a graph neural architecture to preserve rotational symmetry across all trajectory steps, offering robust generalization across geometries and molecular sizes. This work demonstrates the practicality of flow-based, symmetry-aware Hamiltonian generation as a scalable surrogate for expensive DFT computations, with clear benefits for speed and reliability in quantum chemistry workflows.

Abstract

Density functional theory (DFT) is a fundamental method for simulating quantum chemical properties, but it remains expensive due to the iterative self-consistent field (SCF) process required to solve the Kohn-Sham equations. Recently, deep learning methods are gaining attention as a way to bypass this step by directly predicting the Hamiltonian. However, they rely on deterministic regression and do not consider the highly structured nature of Hamiltonians. In this work, we propose QHFlow, a high-order equivariant flow matching framework that generates Hamiltonian matrices conditioned on molecular geometry. Flow matching models continuous-time trajectories between simple priors and complex targets, learning the structured distributions over Hamiltonians instead of direct regression. To further incorporate symmetry, we use a neural architecture that predicts SE(3)-equivariant vector fields, improving accuracy and generalization across diverse geometries. To further enhance physical fidelity, we additionally introduce a fine-tuning scheme to align predicted orbital energies with the target. QHFlow achieves state-of-the-art performance, reducing Hamiltonian error by 71% on MD17 and 53% on QH9. Moreover, we further show that QHFlow accelerates the DFT process without trading off the solution quality when initializing SCF iterations with the predicted Hamiltonian, significantly reducing the number of iterations and runtime.

High-order Equivariant Flow Matching for Density Functional Theory Hamiltonian Prediction

TL;DR

QHFlow reframes Hamiltonian prediction for KS-DFT as learning a distribution over Hamiltonians conditioned on molecular geometry, using high-order SE(3)-equivariant flow matching to capture the structured, blockwise symmetry of RH-DFT Hamiltonians. It introduces symmetry-aware priors (GOE and tensor expansion) and a post hoc energy-alignment fine-tuning to ensure physically faithful orbital energies, achieving state-of-the-art Hamiltonian MAEs on MD17 and QH9 and enabling substantial acceleration of SCF convergence when used to initialize DFT calculations. The approach combines a CNF trajectory with SE(3)-equivariant vector fields and a graph neural architecture to preserve rotational symmetry across all trajectory steps, offering robust generalization across geometries and molecular sizes. This work demonstrates the practicality of flow-based, symmetry-aware Hamiltonian generation as a scalable surrogate for expensive DFT computations, with clear benefits for speed and reliability in quantum chemistry workflows.

Abstract

Density functional theory (DFT) is a fundamental method for simulating quantum chemical properties, but it remains expensive due to the iterative self-consistent field (SCF) process required to solve the Kohn-Sham equations. Recently, deep learning methods are gaining attention as a way to bypass this step by directly predicting the Hamiltonian. However, they rely on deterministic regression and do not consider the highly structured nature of Hamiltonians. In this work, we propose QHFlow, a high-order equivariant flow matching framework that generates Hamiltonian matrices conditioned on molecular geometry. Flow matching models continuous-time trajectories between simple priors and complex targets, learning the structured distributions over Hamiltonians instead of direct regression. To further incorporate symmetry, we use a neural architecture that predicts SE(3)-equivariant vector fields, improving accuracy and generalization across diverse geometries. To further enhance physical fidelity, we additionally introduce a fine-tuning scheme to align predicted orbital energies with the target. QHFlow achieves state-of-the-art performance, reducing Hamiltonian error by 71% on MD17 and 53% on QH9. Moreover, we further show that QHFlow accelerates the DFT process without trading off the solution quality when initializing SCF iterations with the predicted Hamiltonian, significantly reducing the number of iterations and runtime.

Paper Structure

This paper contains 41 sections, 2 theorems, 83 equations, 9 figures, 11 tables, 2 algorithms.

Key Result

Theorem 1

Let ${\mathbf{H}=\bigl(\bar{\otimes}\mathbf{w}^{(\ell)}\bigr)^{(\ell_1,\ell_2)}}$, where an irrep vector $\mathbf{w}^{(\ell)}\sim p(\mathbf{w}^{(\ell)})$ is drawn from a SO(3)-invariant distribution. Then the induced distribution over $\mathbf{H}$ is invariant under SO(3) transformation:

Figures (9)

  • Figure 1: Overview of QHFlow. (a) Initial Hamiltonian sampled from the tensor expansion-based SE(3)-invariant prior (TE). (b) Initial Hamiltonian sampled from the Gaussian orthogonal ensemble SE(3)-invariant prior (GOE). (c) QHFlow transforms the initial Hamiltonian $\mathbf{H}_0$ into the target Hamiltonian $\mathbf{H}_1$ using flow matching, guided by an SE(3)-equivariant vector field $v_\theta(t,\mathcal{M})$ from an invaraint prior $p_0$. The predicted Hamiltonian $\mathbf{H}_1$ defines a learned target distribution $p^\theta_1$ and is used to compute the electronic density $\rho_{\mathcal{M}}$, as well as the $\epsilon_{\text{LUMO}}$ and $\epsilon_{\text{HOMO}}$. When $\mathbf{H}_1$ is used to initialize SCF, QHFlow can accelerate conventional DFT and SCF procedures.
  • Figure 2: Illustration of the Hamiltonian matrix structure of H2O. Rows and columns are indexed by quantum and angular momentum numbers (i.e., $1s$, $2s$) ordered by atom (O, H, H). The full matrix (right) is partitioned into atomic blocks with gray dashed line; the top-left $9 \times 9$ sub-matrix corresponds to O and is shown in detail (left), grouped by quantum and angular momentum. Colors indicate the sign and magnitude of matrix elements.
  • Figure 3: DFT acceleration performance on 300 samples from the QH9 dataset. All metrics are reported as percentages relative to conventional DFT (initialized with minao), which serves as the 100% baseline. The SCF Iter Ratio measures the ratio of SCF iterations required, while Inf T Ratio, SCF T Ratio, and Total T Ratio measure time. Lower SCF Iter Ratio and Total T Ratio values indicate faster convergence. For example, a Total T Ratio of 46% means QHFlow converges in 46% of the conventional DFT time, including the negligible model inference time.
  • Figure 4: (a) Schematic illustration of the full Hamiltonian matrix $\mathbf{H}$ for a water molecule (H2O). Color intensity indicates the magnitude of matrix elements, with red representing larger values and blue representing smaller values. (b) Schematic illustration of the full Wigner D-matrix $\mathcal{D}$ corresponding to $\mathbf{H}$, where green denotes larger values and purple denotes smaller values. Gray solid and dashed lines separate molecular blocks and orbital blocks, respectively, corresponding to submatrices defined by atomic orbital pairs.
  • Figure 5: DFT acceleration performance on 300 samples from the QH9 dataset. All metrics are reported as percentages relative to conventional DFT (initialized with minao), which serves as the 100% baseline. The SCF Iter Ratio measures the ratio of SCF iterations required, while Inf T Ratio, SCF T Ratio, and Total T Ratio measure time.
  • ...and 4 more figures

Theorems & Definitions (2)

  • Theorem 1
  • Theorem 2