Overcoming the Barrier of Orbital-Free Density Functional Theory for Molecular Systems Using Deep Learning

He Zhang; Siyuan Liu; Jiacheng You; Chang Liu; Shuxin Zheng; Ziheng Lu; Tong Wang; Nanning Zheng; Bin Shao

Overcoming the Barrier of Orbital-Free Density Functional Theory for Molecular Systems Using Deep Learning

He Zhang, Siyuan Liu, Jiacheng You, Chang Liu, Shuxin Zheng, Ziheng Lu, Tong Wang, Nanning Zheng, Bin Shao

TL;DR

M-OFDFT presents a deep-learning kinetic energy density functional (KEDF) for orbital-free DFT that integrates non-local density interactions through an atomic-basis expansion and a Graphormer-based attention scheme. By training on multiple density states per structure (with energy and gradient labels) and enforcing SE(3) invariance via local frames, the method achieves KSDFT-competitive accuracy for molecular systems while delivering superior extrapolation to much larger molecules, such as QMugs and biomolecules. Empirically, M-OFDFT exhibits an $O(N^{1.46})$ scaling and substantial speedups over KSDFT (up to ~27x on protein-sized systems), enabling practical studies of large-scale molecular systems with improved accuracy over classical OFDFT. The work also introduces a suite of practical techniques—density optimization-on-manifold initializations, enhancement modules for large gradient ranges, and projection-based training data—that collectively stabilize learning and improve transfer to unseen chemical environments, with clear implications for biomolecular and material-scale quantum simulations.

Abstract

Orbital-free density functional theory (OFDFT) is a quantum chemistry formulation that has a lower cost scaling than the prevailing Kohn-Sham DFT, which is increasingly desired for contemporary molecular research. However, its accuracy is limited by the kinetic energy density functional, which is notoriously hard to approximate for non-periodic molecular systems. Here we propose M-OFDFT, an OFDFT approach capable of solving molecular systems using a deep learning functional model. We build the essential non-locality into the model, which is made affordable by the concise density representation as expansion coefficients under an atomic basis. With techniques to address unconventional learning challenges therein, M-OFDFT achieves a comparable accuracy with Kohn-Sham DFT on a wide range of molecules untouched by OFDFT before. More attractively, M-OFDFT extrapolates well to molecules much larger than those seen in training, which unleashes the appealing scaling of OFDFT for studying large molecules including proteins, representing an advancement of the accuracy-efficiency trade-off frontier in quantum chemistry.

Overcoming the Barrier of Orbital-Free Density Functional Theory for Molecular Systems Using Deep Learning

TL;DR

scaling and substantial speedups over KSDFT (up to ~27x on protein-sized systems), enabling practical studies of large-scale molecular systems with improved accuracy over classical OFDFT. The work also introduces a suite of practical techniques—density optimization-on-manifold initializations, enhancement modules for large gradient ranges, and projection-based training data—that collectively stabilize learning and improve transfer to unseen chemical environments, with clear implications for biomolecular and material-scale quantum simulations.

Abstract

Paper Structure (112 sections, 101 equations, 23 figures, 9 tables, 3 algorithms)

This paper contains 112 sections, 101 equations, 23 figures, 9 tables, 3 algorithms.

Introduction
Results
Workflow of M-OFDFT
Performance of M-OFDFT on Molecular Systems
Extrapolation of M-OFDFT to Larger-Scale Molecules
QMugs
Chignolin
Empirical Time Complexity of M-OFDFT
Discussion
Methods
Training the KEDF Model
Geometric Invariance
Enhancement Modules for Vast Gradient Range
Dimension-wise Rescaling
Natural Reparameterization
...and 97 more sections

Figures (23)

Figure 1: Overview of M-OFDFT.(a) KSDFT solves the properties (for example, the ground-state electron density $\rho^{\star}$, the energy $E^{\star}$, and the force $\mathbf{f}$) of a molecular structure $\mathcal{M} := \{(\mathbf{x}^{(a)}, Z^{(a)})\}_{a=1}^A$ with $N$ electrons, where $\mathbf{x}^{(a)}$ and $Z^{(a)}$ denote the coordinates and atomic number of the $a$-th atom out of a total of $A$ atoms in the molecule, by optimizing $N$ orbital functions $\Phi := \{\phi_i(\mathbf{r})\}_{i=1}^N$, where $\phi_i$ denotes the $i$-th orbital which is a function of the coordinates $\mathbf{r}$ of an electron, so that the kinetic energy can be evaluated directly ($\hat{T}$ is the kinetic-energy operator). In contrast, OFDFT only needs to optimize one density function $\rho(\mathbf{r})$ if the kinetic energy density functional (KEDF) $T_\textnormal{S}[\rho]$ is available, which reduces the complexity by an order of $N$. (b) The proposed M-OFDFT uses a deep learning model $T_{\textnormal{S},\theta}(\mathbf{p},\mathcal{M})$ ($\theta$ denotes learnable parameters) to approximate KEDF, which is learned from data. The model incorporates non-local interaction of density over the space, which is made affordable by inputting a concise representation of the density (gray shaded region around the molecule): the expansion coefficients $\mathbf{p}$ on an atomic basis $\{\omega_\mu(\mathbf{r})\}_\mu$, where $\omega_\mu(\mathbf{r})$ is the $\mu$-th basis function and the index $\mu = (a, \tau)$ is composed of the center atom index $a$ and the pattern index $\tau$ (for example, the blue and red spheres located bottom-left illustrate two basis functions of two patterns centered at atom 2 (the carbon)). The coefficients are correspondingly distributed over the atoms. Non-locality is captured by the attention mechanism which updates features on one atom by calculation with features on all other atoms, including distant ones (for example, the solid blue lines represent the update of features $\mathbf{h}^{(2)}$ of atom 2 incorporates features on all other atoms). After updates by $L$ layers, the final scalar features over atoms are summed up to produce the kinetic energy value. (c) M-OFDFT solves a molecular structure $\mathcal{M}$ by optimizing the density coefficients $\mathbf{p}$ to minimize the electronic energy $E_\theta(\mathbf{p},\mathcal{M})$, which is constructed by the learned KEDF model and three other terms that can be directly evaluated. The red and blue hues represent values of electronic energy.
Figure 2: Results of M-OFDFT compared with classical OFDFT on molecular systems.(a) Relative energy (left) and Hellmann-Feynman (HF) force (right) results in terms of the MAE from KSDFT, with error bars showing 95% confidence intervals. Results for ethanol are statistics over 10,000 test structures, and results for QM9 $\mathrm{C_7 H_{10} O_2}$ isomer are statistics over 619 test isomers. (b) Visualization of optimized density. Each curve plots the integrated density on spheres with varying radii centered at the oxygen atom in an ethanol structure. (c) PES study on ethanol. (left) PES over various torsion angles along the H-C-C-O bond; (right) PES over various O-H bond lengths. The shaded region denotes the range within chemical accuracy (1${\, \mathrm{kcal/mol}}$) with respect to KSDFT.
Figure 3: Extrapolation performance of M-OFDFT compared with other deep learning methods. Considered are M-NNP and M-NNP-Den, which use deep learning models to predict the ground-state energy end-to-end. The shades and error bars show 95% confidence intervals. (a) MAE of per-atom energy on increasingly larger molecules from the QMugs dataset, using models trained on molecules with no more than 15 heavy atoms from QM9 and QMugs datasets. Each value is calculated on 50 QMugs molecules. (b) Energy error on QMugs test molecules ($n$=50) with 56-60 heavy atoms, using models trained on a series of datasets containing increasingly larger QMugs molecules up to 30 heavy atoms. The horizontal dashed black line marks the performance of M-OFDFT trained on the first dataset. (c) Relative energy error on chignolin structures ($n$=50), using models trained on all peptides (lengths 2-5). Also shown is the result of the classical OFDFT using APBE. (d) Energy error on chignolin structures ($n$=1,000), using models trained on a series of datasets including increasingly longer peptides. (e) Energy error on chignolin structures ($n$=50), using models trained on all peptides without ('Pretrain') and with ('Finetune') finetuning on 500 chignolin structures. Also marked are error reduction ratios by the finetuned models over models trained from scratch ('FromScratch') on the 500 chignolin structures only.
Figure 4: Empirical time cost of M-OFDFT compared with KSDFT on molecules ($n$=808) at various scales. Each plotted value is the average of running times on molecules whose number of electrons falls in the corresponding bin of width 20.
Figure S1: The KEDF model architecture.(a) (See also Alg. \ref{['alg:kedf']}.) Overview of the model architecture. The model calculates the non-interacting kinetic energy (or a variant of it) from the given density, specified by the density coefficient vector $\mathbf{p}$ on an atomic basis set (Supplementary Section \ref{['appx:coeff-spec']}), as well as the atomic numbers $\mathbf{Z}$ and coordinates $\mathbf{X}$ of all atoms in the target molecule for characterizing the basis functions. The coefficient vector $\mathbf{p}$ is first processed by the CoefficientAdapter module to make an $\mathrm{SE}(3)$-invariant density features $\tilde{\mathbf{p}}$ and reduce the gradient scale for the rest of the model to fit. The $\tilde{\mathbf{p}}$ features are then used to construct initial node features $\mathbf{h}$ by the NodeEmbedding module (Supplementary Section \ref{['appx:model-node-embed']}), for which the types $\mathbf{Z}$ and positions $\mathbf{X}$ of the atoms are also incorporated to provide information to interpret the coefficient features. The positional input $\mathbf{X}$ is perceived by the model only in terms of pairwise distance features $\bm{\mathcal{E}}$ produced by the Gaussian basis function (GBF) module (Supplementary Section \ref{['appx:model-gbf']}). The node features $\mathbf{h}$ are subsequently updated by several Graphormer-3D (G3D) modules (Supplementary Section \ref{['appx:model-g3d']}). They calculate the interaction of features on every pair of nodes hence cover the non-local effect, in which relative position features $\tilde{\bm{\mathcal{E}}}$ between each pair of nodes are considered, which are processed from $\bm{\mathcal{E}}$ by a multi-layer perceptron (MLP) module (Eq. \ref{['eqn:mlp']}). The final node features are aggregated by another MLP module and summed over the nodes to produce a scalar, and the final output is its addition with the output of the AtomRef enhancement module (Methods \ref{['sec:learn-grad']}, Supplementary Section \ref{['appx:atom-ref']}) which shares the burden to model a large-scale gradient. (b) (See also Alg. \ref{['alg:coeffada']}.) Structure of the CoefficientAdapter module. It consists of the LocalFrame module (Methods \ref{['sec:local-frame']}; Supplementary Section \ref{['appx:local-frame']}) to convert $\mathrm{SE}(3)$-equivariant features into invariant features, followed by two enhancement modules NaturalReparameterization and DimensionwiseRescaling (Methods \ref{['sec:learn-grad']}; Supplementary Section \ref{['appx:nat-reparam']} and \ref{['appx:dim-rescale']}) which reduce the gradient scale for subsequent modules to fit. (c) Structure of the NodeEmbedding module (Supplementary Section \ref{['appx:model-node-embed']}). It integrates information from three sources for each node: atom-type (for basis-set type) $\mathbf{Z}$ which is mapped to a feature vector by atom embedding, positional feature in terms of pairwise distance feature $\bm{\mathcal{E}}$ which is aggregated over nodes and processed by an MLP module, and density features $\tilde{\mathbf{p}}$ which is mapped to a mild numerical range by a shrink gate and processed by another MLP module. (d) Structure of the G3D module (Supplementary Section \ref{['appx:model-g3d']}). The SelfAttention module updates node features by the non-local cross-node interaction of features $\mathbf{h}$ based on spacial relation features $\tilde{\bm{\mathcal{E}}}$ between nodes. An MLP module further processes the updated node features. The LayerNorm module is applied before each of the two modules for numerical convenience. Note that the data on each streamline is the concatenated features across the nodes. Only the SelfAttention module models the interaction across nodes, while other modules are applied to the features of each node independently.
...and 18 more figures

Overcoming the Barrier of Orbital-Free Density Functional Theory for Molecular Systems Using Deep Learning

TL;DR

Abstract

Overcoming the Barrier of Orbital-Free Density Functional Theory for Molecular Systems Using Deep Learning

Authors

TL;DR

Abstract

Table of Contents

Figures (23)