Transferable Neural Wavefunctions for Solids

Leon Gerard; Michael Scherbela; Halvard Sutterud; Matthew Foulkes; Philipp Grohs

Transferable Neural Wavefunctions for Solids

Leon Gerard, Michael Scherbela, Halvard Sutterud, Matthew Foulkes, Philipp Grohs

TL;DR

This work develops transferable neural wavefunctions for solids to dramatically reduce the computational cost of deep-learning variational Monte Carlo. By mapping inexpensive mean-field orbitals to expressive neural orbitals via electron-nuclear embeddings and a shared network across twists, geometries, and supercell sizes, the authors demonstrate accurate results for 1D hydrogen chains, graphene, and LiH, while enabling twist-averaged and finite-size-corrected observables with far fewer optimization steps. A key achievement is transferring a neural wavefunction trained on a 2×2×2 LiH cell to a 3×3×3 supercell, reducing optimization steps by about a factor of 50 and achieving cohesive energies within close agreement to experiment when ZPVE and finite-size corrections are included. The approach reduces the need for separate per-geometry trainings, enabling scalable DL-VMC studies of large solid-state systems and potentially opening pathways to metals and high-temperature semiconductors with realistic many-electron treatments.

Abstract

Deep-Learning-based Variational Monte Carlo (DL-VMC) has recently emerged as a highly accurate approach for finding approximate solutions to the many-electron Schrödinger equation. Despite its favorable scaling with the number of electrons, $\mathcal{O}(n_\text{el}^{4})$, the practical value of DL-VMC is limited by the high cost of optimizing the neural network weights for every system studied. To mitigate this problem, recent research has proposed optimizing a single neural network across multiple systems, reducing the cost per system. Here we extend this approach to solids, where similar but distinct calculations using different geometries, boundary conditions, and supercell sizes are often required. We show how to optimize a single ansatz across all of these variations, reducing the required number of optimization steps by an order of magnitude. Furthermore, we exploit the transfer capabilities of a pre-trained network. We successfully transfer a network, pre-trained on 2x2x2 supercells of LiH, to 3x3x3 supercells. This reduces the number of optimization steps required to simulate the large system by a factor of 50 compared to previous work.

Transferable Neural Wavefunctions for Solids

TL;DR

Abstract

, the practical value of DL-VMC is limited by the high cost of optimizing the neural network weights for every system studied. To mitigate this problem, recent research has proposed optimizing a single neural network across multiple systems, reducing the cost per system. Here we extend this approach to solids, where similar but distinct calculations using different geometries, boundary conditions, and supercell sizes are often required. We show how to optimize a single ansatz across all of these variations, reducing the required number of optimization steps by an order of magnitude. Furthermore, we exploit the transfer capabilities of a pre-trained network. We successfully transfer a network, pre-trained on 2x2x2 supercells of LiH, to 3x3x3 supercells. This reduces the number of optimization steps required to simulate the large system by a factor of 50 compared to previous work.

Paper Structure (32 sections, 43 equations, 4 figures, 4 tables)

This paper contains 32 sections, 43 equations, 4 figures, 4 tables.

Introduction
Results
1D: Hydrogen chains
Energy per atom
Metal-insulator transition
Graphene
Lithium Hydride
Discussion
Methods
Notation
Deep-learning Variational Monte Carlo
Architecture
Overview
Ansatz
Input
...and 17 more sections

Figures (4)

Figure 1: 1D Hydrogen chain: a: Extrapolation of the energy per atom to the thermodynamic limit for $R=1.8 a_0$. Results obtained using DeepSolid (neural wavefunction), lattice-regularized diffusion Monte Carlo (LR-DMC), auxiliary field Monte Carlo (AFQMC), and our transferable neural wavefunction are shown. Open markers indicate energies computed by fine-tuning a model pre-trained on smaller supercells. The shaded area depicts the statistical uncertainty in the AFQMC result. Monte Carlo uncertainty of our results is $\approx10 \mu \text{Ha}$, well below the marker size. b: The complex polarization $|z|$ as a function of the inter-atomic separation, $R$, showing a phase transition between a metal at small $R$ and an insulator at large $R$. AFQMC, DMC, and VMC results are taken from simonscollaborationonthemany-electronproblemGroundStatePropertiesHydrogen2020. DeepSolid results are taken from li_ab_solids_2022.
Figure 2: Twist-dependent energy of Graphenea: Grid of pretrained twists and path of fine-tuned values through Brillouin zone. b: Fine-tuned energies of graphene along path of twists across the Brillouin zone. Fine-tuned using shared optimization and around 100 additional optimization iterations per twist. Error bars are smaller than the size of the markers.
Figure 3: Energy-volume curve of LiH per primitive cell for a $2 \times 2 \times 2$ supercell as calculated using DeepSolid li_ab_solids_2022 and our transferable DL-VMC method. The DeepSolid results (black circles, with a Birch-Murnaghan fit represented as a black line) were obtained at a single twist, the $\Gamma$-point. Hartree-Fock corrections were applied, as discussed in li_ab_solids_2022, and a ZPVE correction added. Our results (orange circles, with a Birch-Murnaghan fit represented as an orange line) are twist averaged, using a $5 \times 5 \times 5$ Monkhorst-Pack grid per lattice constant. Structure-factor-based corrections were applied and a ZPVE correction added. The grey bar indicates the experimental uncertainty Nolan_exp_results_lih. The statistical error bars are too small to be visible on this scale and therefore have beenn omitted. The vertical orange dashed line indicates the equilibrium lattice constant as calculated from the Birch-Murnaghan fit to our data. The orange cross shows the twist-averaged cohesive energy of a $3 \times 3 \times 3$ simulation cell, using again structure factor correction. This was obtained by transferring the network pre-trained for the $2 \times 2 \times 2$ system to a $3 \times 3 \times 3$ supercell, using only 8,000 additional optimization steps. A $5 \times 5 \times 5$ Monkhorst-Pack grid of twists was used. The black cross shows the result of DeepSolid's $3 \times 3 \times 3$$\Gamma$-point calculation with a Hartree-Fock finite-size correction.
Figure S1: Scaling of computational cost: Markers correspond to measured timings, lines correspond to least-square fits of power-laws, with the exponents denoted in the legend.

Transferable Neural Wavefunctions for Solids

TL;DR

Abstract

Transferable Neural Wavefunctions for Solids

Authors

TL;DR

Abstract

Table of Contents

Figures (4)