Grad DFT: a software library for machine learning enhanced density functional theory

Pablo A. M. Casares; Jack S. Baker; Matija Medvidovic; Roberto dos Reis; Juan Miguel Arrazola

Grad DFT: a software library for machine learning enhanced density functional theory

Pablo A. M. Casares, Jack S. Baker, Matija Medvidovic, Roberto dos Reis, Juan Miguel Arrazola

TL;DR

The paper tackles the challenge of improving DFT accuracy for strongly correlated systems by introducing Grad DFT, a fully differentiable library built in JAX that enables rapid development and training of neural functionals. It formalizes a general parametrized functional framework where coefficient functions, produced by neural networks, weight energy-density components to form exchange-correlation energies, including hybrids and dispersion corrections. Key contributions include a flexible, differentiable architecture for functionals, scalable SCF procedures, and integration with PySCF, along with a benchmark dataset of experimental dissociation energies for dimers to study generalization and data-noise effects. Empirical results illustrate both the potential and limitations of neural functionals for extrapolation across potential energy surfaces and across atomic species, and underscore the importance of data quality and possible benefits of self-consistent training for robust generalization.

Abstract

Density functional theory (DFT) stands as a cornerstone method in computational quantum chemistry and materials science due to its remarkable versatility and scalability. Yet, it suffers from limitations in accuracy, particularly when dealing with strongly correlated systems. To address these shortcomings, recent work has begun to explore how machine learning can expand the capabilities of DFT; an endeavor with many open questions and technical challenges. In this work, we present Grad DFT: a fully differentiable JAX-based DFT library, enabling quick prototyping and experimentation with machine learning-enhanced exchange-correlation energy functionals. Grad DFT employs a pioneering parametrization of exchange-correlation functionals constructed using a weighted sum of energy densities, where the weights are determined using neural networks. Moreover, Grad DFT encompasses a comprehensive suite of auxiliary functions, notably featuring a just-in-time compilable and fully differentiable self-consistent iterative procedure. To support training and benchmarking efforts, we additionally compile a curated dataset of experimental dissociation energies of dimers, half of which contain transition metal atoms characterized by strong electronic correlations. The software library is tested against experimental results to study the generalization capabilities of a neural functional across potential energy surfaces and atomic species, as well as the effect of training data noise on the resulting model accuracy.

Grad DFT: a software library for machine learning enhanced density functional theory

TL;DR

Abstract

Paper Structure (4 sections, 23 equations, 3 figures)

This paper contains 4 sections, 23 equations, 3 figures.

Introduction
Related work
Parametrized functionals
The Grad DFT library

Figures (3)

Figure 1: Schematic depiction of the general workflow for machine learning-enhanced density functional theory. We envision a setting where high-quality data is generated using experiments and advanced wavefunction-based simulations on classical or quantum computers. These act as data factories generating large datasets that can be used for training new functionals. We consider the case of neural functionals, functionals in which neural networks are used to predict the local weights $\bm{c}_{\theta}[\rho](\bm{r})$ of associated energy densities $\bm{e}_{\theta}[\rho](\bm{r})$, when given the electronic density as input. $E_{xc,\theta}$ stands for a $\theta$-parametrized exchange-correlation energy functional.
Figure 2: Conceptual representation of the importance of allowing the coefficients $\bm{c}(\bm{r})$ to vary over space. In this example, we consider a basis of two one-dimensional functions $e_1(x)=\sin(2.5x)$ and $e_2(x)=1/x$. The goal is to combine them to produce the target function on the left, which transitions from sinusoidal oscillations to monotonic decay. Replicating this behavior is not possible choosing constant coefficients $c_1$ and $c_2$, but it can be achieved by selecting associated coefficient functions $c_1(x)=1-\text{erf}(x)$ and $c_2(x)=\text{erf}(x)$, and setting the model to be $\bm{c}(x)\cdot \bm{e}(x)=c_1(x)e_1(x) + c_2(x)e_2(x)$.
Figure 3: A pictorial summary and example of how the neural functional works. The value of the coefficients $\bm{c}_\theta[\rho](\bm{r})$ is fixed by a neural network whose input is the electronic density, its gradients, etc. The coefficients get dot-multiplied by some energy densities, and the result is integrated to compute the exchange-correlation energy.

Grad DFT: a software library for machine learning enhanced density functional theory

TL;DR

Abstract

Grad DFT: a software library for machine learning enhanced density functional theory

Authors

TL;DR

Abstract

Table of Contents

Figures (3)