Table of Contents
Fetching ...

Convergence of variational Monte Carlo simulation and scale-invariant pre-training

Nilin Abrahamsen, Zhiyan Ding, Gil Goldshlager, Lin Lin

TL;DR

The paper addresses convergence for variational Monte Carlo applied to neural-network wave functions in electronic structure by analyzing both energy minimization and scale-invariant supervised pre-training. It leverages the scale-invariant Rayleigh quotient and introduces a directionally unbiased gradient estimator to prove convergence bounds for SGD-like updates with MCMC sampling. A scale-invariant loss is proposed for pre-training, with theoretical guarantees mirroring nonconvex SGD rates, and numerical experiments demonstrate faster pre-training and plausible VMC convergence on small, strongly correlated systems. The results suggest scalable, principled guidance for optimizing neural quantum states and point toward extensions to alternative optimization schemes and manifold-based formulations.

Abstract

We provide theoretical convergence bounds for the variational Monte Carlo (VMC) method as applied to optimize neural network wave functions for the electronic structure problem. We study both the energy minimization phase and the supervised pre-training phase that is commonly used prior to energy minimization. For the energy minimization phase, the standard algorithm is scale-invariant by design, and we provide a proof of convergence for this algorithm without modifications. The pre-training stage typically does not feature such scale-invariance. We propose using a scale-invariant loss for the pretraining phase and demonstrate empirically that it leads to faster pre-training.

Convergence of variational Monte Carlo simulation and scale-invariant pre-training

TL;DR

The paper addresses convergence for variational Monte Carlo applied to neural-network wave functions in electronic structure by analyzing both energy minimization and scale-invariant supervised pre-training. It leverages the scale-invariant Rayleigh quotient and introduces a directionally unbiased gradient estimator to prove convergence bounds for SGD-like updates with MCMC sampling. A scale-invariant loss is proposed for pre-training, with theoretical guarantees mirroring nonconvex SGD rates, and numerical experiments demonstrate faster pre-training and plausible VMC convergence on small, strongly correlated systems. The results suggest scalable, principled guidance for optimizing neural quantum states and point toward extensions to alternative optimization schemes and manifold-based formulations.

Abstract

We provide theoretical convergence bounds for the variational Monte Carlo (VMC) method as applied to optimize neural network wave functions for the electronic structure problem. We study both the energy minimization phase and the supervised pre-training phase that is commonly used prior to energy minimization. For the energy minimization phase, the standard algorithm is scale-invariant by design, and we provide a proof of convergence for this algorithm without modifications. The pre-training stage typically does not feature such scale-invariance. We propose using a scale-invariant loss for the pretraining phase and demonstrate empirically that it leads to faster pre-training.
Paper Structure (21 sections, 11 theorems, 70 equations, 4 figures)

This paper contains 21 sections, 11 theorems, 70 equations, 4 figures.

Key Result

Theorem 2.1

The gradient of the expected reward is where $R(\tau)$ is the reward of trajectory $\tau$, and $s_t,a_t$ are the states and actions in the trajectory.

Figures (4)

  • Figure 1: Convergence of supervised pre-training using the scale-invariant training loss (orange) vs. the training loss of vonglehn2023selfattention (blue). For both optimizers the plotted quantity is the sine of the angle between the target state and the trained state, where the angle is defined in $\mathscr L^2(\mathbb R^{3n},\rho=|\varphi|^2)$ with respect to the measure induced by the target state density.
  • Figure 2: Atomic configuration for the square H$_4$ model.
  • Figure 3: Convergence of VMC run on the H$_4$ square. The running minimum is taken to smooth out the data and match the form of \ref{['cor:VMC']}.
  • Figure 4: Lipschitz constant for VMC run on the H$_4$ square with 1000 walkers. The constant is numerically approximated using the formula $|G(\theta_{m+1})- G(\theta_m)|/|\theta_{m+1} - \theta_m|$.

Theorems & Definitions (21)

  • Theorem 2.1: polgrad
  • Lemma 4.1
  • Theorem 4.3
  • Corollary 4.4
  • Lemma 4.5
  • Remark 4.6
  • Remark 4.7
  • Theorem 4.9
  • Corollary 4.10
  • proof : Proof of \ref{['eqn:gradient_L_VMC']}
  • ...and 11 more