Addressing Ill-conditioning in Density Functional Theory for Reliable Machine Learning

L. Arnstein; J. Wetherell; R. Lawrence; P. J. Hasnip; M. J. P. Hodgson

Addressing Ill-conditioning in Density Functional Theory for Reliable Machine Learning

L. Arnstein, J. Wetherell, R. Lawrence, P. J. Hasnip, M. J. P. Hodgson

TL;DR

Owing to an absence of ill-conditioning in potential functionals, it is found that providing the external potential as input to the ML model leads to significantly improved predictions of quantities in these two classes.

Abstract

In principle, machine learning (ML) can be used to obtain any electronic property of a many-body system from its electron density within density functional theory. However, some physical quantities are highly sensitive to small variations in the density. This 'ill-conditioning' limits the accuracy with which these quantities can be learned as density functionals from a fixed amount of data. We identify sources of ill-conditioning present in density functionals that belong to two ubiquitous classes: 1) Physical quantities that are globally gauge-dependent, meaning they change value if a constant shift is applied to the external potential -- for example, the total energy; 2) Functionals of the N-electron density that have an implicit dependence on the (N+1)-electron density, such as the fundamental gap. We demonstrate that widely used ML models exhibit orders-of-magnitude greater error when applied to these ill-conditioned density functionals compared to other functionals that fall into neither class, even when the global gauge is fixed to prevent constant shifts. Owing to an absence of ill-conditioning in potential functionals, we find that providing the external potential as input to the ML model leads to significantly improved predictions of quantities in these two classes.

Addressing Ill-conditioning in Density Functional Theory for Reliable Machine Learning

TL;DR

Abstract

Paper Structure (7 sections, 7 equations, 5 figures)

This paper contains 7 sections, 7 equations, 5 figures.

Introduction
Methods
Data generation
Data analysis
Machine learning
Results
Conclusion

Figures (5)

Figure 1: (a) Schematic of a globally gauge-dependent density functional that experiences a constant shift in the external potential. The quantity jumps in value, but the electron density does not change. Hence, the change in the functional is discontinuous with respect to the density. As the same input maps to two different outputs, the learning problem is intractable. This situation is avoided by fixing the global gauge, which prevents constant shifts from occurring. (b) Schematic of a globally gauge-dependent density functional that experiences a near-constant shift in the external potential. Although the functional is now smooth, machine-learning the mapping between the density and the functional is challenging in this region of density space because the functional changes so rapidly.
Figure 2: An example pair of two-electron molecular systems from our dataset that shows how systems can differ by a near-constant shift. The black solid lines and dashed orange lines each correspond to a system (see text for details), for which the external potentials and electron densities are plotted in (c) and (a), respectively. The blue dotted lines show the difference between the external potentials, which, due to the fixed global gauge, is non-constant. However, panel (d) shows that over the region where most of the electron density is concentrated, the difference between the two external potentials is approximately constant. Consequently, the two systems have almost identical electron densities (b), but differ substantially in their globally gauge-dependent properties -- for example, the system represented by solid black lines has a total energy of -7.72 Ha, whereas the other system has a total energy of -7.18 Ha.
Figure 3: Two systems (one indicated by solid black lines, the other by dashed orange lines -- details in the main text) that illustrate how an implicit dependence on the $\left ( N+1 \right )$-electron density can lead to ill-conditioning. They are both two-electron molecular systems wherein both electrons are confined to the left well (b). In this region, the difference between the external potentials of the two systems (c) is close to 0, leading to similar electron densities. However, the corresponding three-electron densities (a) are spread between both wells, and the potentials differ significantly in the right well. This leads to large differences between the three-electron densities and quantities that depend on them, such as the affinity and fundamental gap. As a consequence, the changes in these properties of the two-electron systems are disproportionate to the small change in the two-electron density.
Figure 4: The local Lipschitz constant $\hat{L}$ is a lower bound on the maximum of a function or functional's rate of change within a restricted domain. We attain a simple estimate by computing the maximum finite difference quotient between all pairs of systems in our dataset. $\hat{L}$ is calculated for all density functionals (orange bars) and potential functionals (grey hatched bars), enabling a comparison of how rapidly each varies with respect to its input. The ill-conditioning effects we have identified manifest as values of $\hat{L}$ on the order of $10^2$ for $E[n]$, $I[n]$, $A[n]$ and $E_g[n]$. For $F[n]$, which does not suffer from these effects, $\hat{L}$ is two orders of magnitude smaller. Similarly, $\hat{L}$ is on the order of $10^0$ for all potential functionals, indicating the absence of ill-conditioning.
Figure 5: The MAE in our ML approximations to density functionals (orange bars) and potential functionals (grey hatched bars). For the density functionals, the error spans several orders of magnitude: from $10^{-6}$ Ha ($F[n]$) to $10^{-4}$ Ha ($E[n]$ and $I[n]$) to $10^{-2}$ Ha ($A[n]$ and $E_g[n]$). In contrast, the error is far more balanced between the different potential functionals, on the order of $10^{-5}$ Ha for all quantities. This is in close parallel to the pattern in Figure \ref{['fig:Lipschitz']}, where we observe much greater Lipschitz constants for other density functionals compared to $F[n]$, but more of a balance in the potential functionals. As such, we can infer that ill-conditioning is a primary determinant of the error in these results, explaining the significantly lower accuracy observed for $E[n]$, $I[n]$, $A[n]$, and $E_g[n]$.

Theorems & Definitions (1)

Definition 1

Addressing Ill-conditioning in Density Functional Theory for Reliable Machine Learning

TL;DR

Abstract

Addressing Ill-conditioning in Density Functional Theory for Reliable Machine Learning

Authors

TL;DR

Abstract

Table of Contents

Figures (5)

Theorems & Definitions (1)