Transferable Neural Wavefunctions for Solids
Leon Gerard, Michael Scherbela, Halvard Sutterud, Matthew Foulkes, Philipp Grohs
TL;DR
This work develops transferable neural wavefunctions for solids to dramatically reduce the computational cost of deep-learning variational Monte Carlo. By mapping inexpensive mean-field orbitals to expressive neural orbitals via electron-nuclear embeddings and a shared network across twists, geometries, and supercell sizes, the authors demonstrate accurate results for 1D hydrogen chains, graphene, and LiH, while enabling twist-averaged and finite-size-corrected observables with far fewer optimization steps. A key achievement is transferring a neural wavefunction trained on a 2×2×2 LiH cell to a 3×3×3 supercell, reducing optimization steps by about a factor of 50 and achieving cohesive energies within close agreement to experiment when ZPVE and finite-size corrections are included. The approach reduces the need for separate per-geometry trainings, enabling scalable DL-VMC studies of large solid-state systems and potentially opening pathways to metals and high-temperature semiconductors with realistic many-electron treatments.
Abstract
Deep-Learning-based Variational Monte Carlo (DL-VMC) has recently emerged as a highly accurate approach for finding approximate solutions to the many-electron Schrödinger equation. Despite its favorable scaling with the number of electrons, $\mathcal{O}(n_\text{el}^{4})$, the practical value of DL-VMC is limited by the high cost of optimizing the neural network weights for every system studied. To mitigate this problem, recent research has proposed optimizing a single neural network across multiple systems, reducing the cost per system. Here we extend this approach to solids, where similar but distinct calculations using different geometries, boundary conditions, and supercell sizes are often required. We show how to optimize a single ansatz across all of these variations, reducing the required number of optimization steps by an order of magnitude. Furthermore, we exploit the transfer capabilities of a pre-trained network. We successfully transfer a network, pre-trained on 2x2x2 supercells of LiH, to 3x3x3 supercells. This reduces the number of optimization steps required to simulate the large system by a factor of 50 compared to previous work.
