Table of Contents
Fetching ...

A Kaczmarz-inspired approach to accelerate the optimization of neural network wavefunctions

Gil Goldshlager, Nilin Abrahamsen, Lin Lin

TL;DR

SPRING presents a Kaczmarz-inspired optimizer for variational Monte Carlo training of neural network wavefunctions, blending the minimal-norm MinSR update with randomized projections to exploit history across minibatches. By regularizing and stabilizing the projection, and coupling it with a norm constraint, SPRING achieves faster convergence and higher accuracy than MinSR and KFAC on small atoms, notably reaching chemical accuracy for oxygen much earlier. The work provides detailed algorithmic development, numerical results, and hyperparameter analyses, while acknowledging the absence of rigorous convergence guarantees in the VMC context. This approach offers a practical path toward scaling neural network wavefunctions to larger systems by reducing the optimization bottleneck without sacrificing stability.

Abstract

Neural network wavefunctions optimized using the variational Monte Carlo method have been shown to produce highly accurate results for the electronic structure of atoms and small molecules, but the high cost of optimizing such wavefunctions prevents their application to larger systems. We propose the Subsampled Projected-Increment Natural Gradient Descent (SPRING) optimizer to reduce this bottleneck. SPRING combines ideas from the recently introduced minimum-step stochastic reconfiguration optimizer (MinSR) and the classical randomized Kaczmarz method for solving linear least-squares problems. We demonstrate that SPRING outperforms both MinSR and the popular Kronecker-Factored Approximate Curvature method (KFAC) across a number of small atoms and molecules, given that the learning rates of all methods are optimally tuned. For example, on the oxygen atom, SPRING attains chemical accuracy after forty thousand training iterations, whereas both MinSR and KFAC fail to do so even after one hundred thousand iterations.

A Kaczmarz-inspired approach to accelerate the optimization of neural network wavefunctions

TL;DR

SPRING presents a Kaczmarz-inspired optimizer for variational Monte Carlo training of neural network wavefunctions, blending the minimal-norm MinSR update with randomized projections to exploit history across minibatches. By regularizing and stabilizing the projection, and coupling it with a norm constraint, SPRING achieves faster convergence and higher accuracy than MinSR and KFAC on small atoms, notably reaching chemical accuracy for oxygen much earlier. The work provides detailed algorithmic development, numerical results, and hyperparameter analyses, while acknowledging the absence of rigorous convergence guarantees in the VMC context. This approach offers a practical path toward scaling neural network wavefunctions to larger systems by reducing the optimization bottleneck without sacrificing stability.

Abstract

Neural network wavefunctions optimized using the variational Monte Carlo method have been shown to produce highly accurate results for the electronic structure of atoms and small molecules, but the high cost of optimizing such wavefunctions prevents their application to larger systems. We propose the Subsampled Projected-Increment Natural Gradient Descent (SPRING) optimizer to reduce this bottleneck. SPRING combines ideas from the recently introduced minimum-step stochastic reconfiguration optimizer (MinSR) and the classical randomized Kaczmarz method for solving linear least-squares problems. We demonstrate that SPRING outperforms both MinSR and the popular Kronecker-Factored Approximate Curvature method (KFAC) across a number of small atoms and molecules, given that the learning rates of all methods are optimally tuned. For example, on the oxygen atom, SPRING attains chemical accuracy after forty thousand training iterations, whereas both MinSR and KFAC fail to do so even after one hundred thousand iterations.
Paper Structure (19 sections, 48 equations, 11 figures, 5 tables, 1 algorithm)

This paper contains 19 sections, 48 equations, 11 figures, 5 tables, 1 algorithm.

Figures (11)

  • Figure 1: Learning rate sweeps on the carbon atom with four different optimizers.
  • Figure 2: Comparison of methods on three small atoms, with learning rates tuned on the carbon atom.
  • Figure 3: Learning rate sweeps on the nitrogen molecule at equilibrium bond distance with four different optimizers.
  • Figure 4: Comparison of methods on N$_2$ molecule at two bond distances, with learning rates tuned at equilibrium.
  • Figure 5: Comparison of methods on CO molecule, with learning rates tuned on N$_2$.
  • ...and 6 more figures