Table of Contents
Fetching ...

Self-Consistency Training for Density-Functional-Theory Hamiltonian Prediction

He Zhang, Chang Liu, Zun Wang, Xinran Wei, Siyuan Liu, Nanning Zheng, Bin Shao, Tie-Yan Liu

TL;DR

This work highlights that Hamiltonian prediction possesses a self-consistency principle, based on which it is proposed self-consistency training, an exact training method that does not require labeled data and is more efficient than running DFT to generate labels for supervised training, since it amortizes DFT calculation over a set of queries.

Abstract

Predicting the mean-field Hamiltonian matrix in density functional theory is a fundamental formulation to leverage machine learning for solving molecular science problems. Yet, its applicability is limited by insufficient labeled data for training. In this work, we highlight that Hamiltonian prediction possesses a self-consistency principle, based on which we propose self-consistency training, an exact training method that does not require labeled data. It distinguishes the task from predicting other molecular properties by the following benefits: (1) it enables the model to be trained on a large amount of unlabeled data, hence addresses the data scarcity challenge and enhances generalization; (2) it is more efficient than running DFT to generate labels for supervised training, since it amortizes DFT calculation over a set of queries. We empirically demonstrate the better generalization in data-scarce and out-of-distribution scenarios, and the better efficiency over DFT labeling. These benefits push forward the applicability of Hamiltonian prediction to an ever-larger scale.

Self-Consistency Training for Density-Functional-Theory Hamiltonian Prediction

TL;DR

This work highlights that Hamiltonian prediction possesses a self-consistency principle, based on which it is proposed self-consistency training, an exact training method that does not require labeled data and is more efficient than running DFT to generate labels for supervised training, since it amortizes DFT calculation over a set of queries.

Abstract

Predicting the mean-field Hamiltonian matrix in density functional theory is a fundamental formulation to leverage machine learning for solving molecular science problems. Yet, its applicability is limited by insufficient labeled data for training. In this work, we highlight that Hamiltonian prediction possesses a self-consistency principle, based on which we propose self-consistency training, an exact training method that does not require labeled data. It distinguishes the task from predicting other molecular properties by the following benefits: (1) it enables the model to be trained on a large amount of unlabeled data, hence addresses the data scarcity challenge and enhances generalization; (2) it is more efficient than running DFT to generate labels for supervised training, since it amortizes DFT calculation over a set of queries. We empirically demonstrate the better generalization in data-scarce and out-of-distribution scenarios, and the better efficiency over DFT labeling. These benefits push forward the applicability of Hamiltonian prediction to an ever-larger scale.
Paper Structure (56 sections, 42 equations, 10 figures, 14 tables, 1 algorithm)

This paper contains 56 sections, 42 equations, 10 figures, 14 tables, 1 algorithm.

Figures (10)

  • Figure 1: Hamiltonian prediction is the task to use a machine learning model to predict the mean-field Hamiltonian matrix $\hat{\mathbf{H}}_\theta(\mathcal{M})$ in density functional theory from a given molecular structure $\mathcal{M} := \{\mathcal{Z}, \mathcal{R}\}$ specified by the atomic types $\mathcal{Z}$ and coordinates $\mathcal{R}$ of atoms. It can derive various molecular properties, e.g., the total energy $E$, the HOMO and LUMO energies $\epsilon_\mathrm{HOMO}, \epsilon_\mathrm{LUMO}$ and their gap $\epsilon_\Delta$ for the given molecule, and can also serve as an accurate DFT initialization. We highlight in this work that the task has a self-consistency principle (the blue loop arrow), which allows training the model without labeled data.
  • Figure 2: Illustration of the proposed self-consistency training with comparison to the conventional DFT calculation and supervised training. (Left) The central task of a DFT calculation is to solve the Kohn-Sham equation (Eq. \ref{['eqn:ks-eq-mat']}) for the given molecular structure $\mathcal{M}$. (Middle) The equation is equivalent to the condition that the eigenvectors $\mathbf{C}$ of $\mathbf{H}$ recover $\mathbf{H}$ via a known function $\mathbf{H}_\mathcal{M}(\mathbf{C})$. (Top-Right) To solve the equation, conventional DFT uses a fixed-point iteration (SCF iteration), which, upon convergence, gives the label $\mathbf{H}^\star_\mathcal{M}$ for supervised training (Eq. \ref{['eqn:data-loss']}) of a Hamiltonian prediction model $\hat{\mathbf{H}}_\theta(\mathcal{M})$. (Bottom-Right) In contrast, self-consistency training (Eq. \ref{['eqn:sc-loss']}) directly minimizes the mismatch between the predicted Hamiltonian $\hat{\mathbf{H}}_\theta(\mathcal{M})$ and the matrix $\mathbf{H}_\mathcal{M}(\mathbf{C}_{\mathcal{M},\theta})$ reconstructed from its eigenvectors.
  • Figure 3: Efficiency comparison in the data-scarce scenario (MD17 Hamiltonian) among self-consistency training on unlabeled data, supervised training following DFT labeling on unlabeled data (extended-label), and supervised training along with DFT labeling (extended-label-online). Dotted horizontal lines extend from the last measured point of the respective curves.
  • Figure 4: Efficiency comparison in the OOD scenario (QH9) among fine-tuning using self-consistency training on unlabeled data, supervised training following DFT labeling on unlabeled data (extended-label), and supervised training along with DFT labeling (extended-label-online). Dotted horizontal lines extend from the last measured point of the respective curves.
  • Figure 2.1: The whole architecture of the adapter module. Given the atom types $\mathcal{Z}$ and positions $\mathcal{R}$ as inputs, the pretrained QHNet model is used to produce atomic representations $\mathbf{h}$, pairwise representations $\mathbf{f}$ and the initial Hamiltonian prediction $\hat{\mathbf{H}}$. Subsequently, the adapter module is utilized to produce refinement Hamiltonian $\hat{\mathbf{H}}^{'}$ based on $\mathbf{h}$ and $\mathbf{f}$. Finally, the refinement Hamiltonian is combined with the initial Hamiltonian prediction as the final output $\hat{\mathbf{H}}^{"}$. $t_1$, $t_2$, $o_1$ and $o_2$ denote learnable combination coefficients. $a$ and $b$ denote the indexes of atoms.
  • ...and 5 more figures