Table of Contents
Fetching ...

Auto-regressive Neural Quantum State Sampling for Selected Configuration Interaction

Shane Thompson, Daniel Gunlycke

Abstract

Accurate ground-state energy calculations remain a central challenge in quantum chemistry due to the exponential scaling of the many-body Hilbert space. Variational Monte Carlo and variational quantum eigensolvers offer promising ansatz optimization approaches but face limitations in convergence as well as hardware constraints. We introduce a particular Selected Configuration Interaction (SCI) algorithm that uses auto-regressive neural networks (ARNNs) to guide subspace expansion for ground-state search. Leveraging the unique properties of ARNNs, our algorithm efficiently constructs compact variational subspaces from learned ground-state statistics, which in turn accelerates convergence to the ground-state energy. Benchmarks on molecular systems demonstrate that ARNN-guided subspace expansion combines the strengths of neural-network representations and classical subspace methods, providing a scalable framework for classical and hybrid quantum-classical algorithms.

Auto-regressive Neural Quantum State Sampling for Selected Configuration Interaction

Abstract

Accurate ground-state energy calculations remain a central challenge in quantum chemistry due to the exponential scaling of the many-body Hilbert space. Variational Monte Carlo and variational quantum eigensolvers offer promising ansatz optimization approaches but face limitations in convergence as well as hardware constraints. We introduce a particular Selected Configuration Interaction (SCI) algorithm that uses auto-regressive neural networks (ARNNs) to guide subspace expansion for ground-state search. Leveraging the unique properties of ARNNs, our algorithm efficiently constructs compact variational subspaces from learned ground-state statistics, which in turn accelerates convergence to the ground-state energy. Benchmarks on molecular systems demonstrate that ARNN-guided subspace expansion combines the strengths of neural-network representations and classical subspace methods, providing a scalable framework for classical and hybrid quantum-classical algorithms.

Paper Structure

This paper contains 17 sections, 28 equations, 13 figures, 1 table.

Figures (13)

  • Figure 1: NQS-based Selected Configuration Interaction procedure for approximating molecular ground-state energies and wavefunctions. An initial GS approximation $\ket{\Psi_\text{init}}$ generates training data (configurations with sampling frequencies) from which a neural network can be trained. Sampling the neural network adds unseen configurations to a "sampled subspace" over which we perform exact diagonalization and obtain a new GS approximation, which we either accept as $\ket{\Psi_\text{opt}}$ or use as an initiator $\ket{\Psi_i}$ for the next iteration. For $i=0$, we skip Steps 3 and 4, letting the approximation itself determine the sampled subspace.
  • Figure 2: Visual representation of iterations $i=0$ and $i=1$ of our algorithm. The initial state $\ket{\Psi_{\text{init}}}$ is sampled for important configurations and an efficient approximation $\ket{\Psi_{i=0}}$ is constructed by diagonalizing the Hamiltonian in the sampled subspace. The NQS $\ket{\Psi_{\text{ARNN}}}$ is trained from $\ket{\Psi_{i=0}}$ and then is itself sampled. $\ket{\Psi_{\text{opt}}}$ is an energy minimum in the sampled subspace and therefore approximates the true ground state, $\ket{\Psi_{\text{GS}}}$. The three arrows correspond to (1) obtaining the $i=0$ subspace from $\ket{\Psi_\text{init}}$, (2) training $\ket{\Psi_\text{ARNN}}$ using data from $\ket{\Psi_{i=0}}$, and (3) obtaining the $i>0$ subspace from $\ket{\Psi_\text{ARNN}}$.
  • Figure 3: Auto-regressive Neural Network constructed from masked-dense layers, adopted from NetKet's "ARNNDense" model. The depicted model acts on bitstrings of length six and uses two masked-dense layers with multiple features per bit (four for the first layer). The final masked-dense layer has two features from which binomial probabilities conditioned on preceding bits are computed. Arrows of the same color between layers correspond to all of the information fed in to one particular bit in the succeeding layer from the preceding layer, and this color is reused for the same exact bit lying at the heads of the arrows from layer to layer. In computing wavefunction values at given configurations, the information from the input configuration is fed forward to select which neuron in the second-to-last layer is kept for computing the outputs $\log\left(\Psi_q\left(n_q\right)\right)$ (not configurations as shown in the Figure), which we then sum over to compute $\log\Psi\left(n\right)$, see Eq. \ref{['eq:wvfn_output']}.
  • Figure 4: Modified ARNN probability distribution for various values of inverse temperature $\beta$, for the $\text{C}_2\text{H}_2$ molecule considered in Section \ref{['section:Results']}. The horizontal axis organizes, into bins of $20$ configurations, the first $2000$ configurations that appear when sorting them based on the absolute squares (Born probabilities) of the exact wavefunction amplitudes. The blue bars correspond to the weights of each bin, for the exact GS. The orange bars are derived from direct sampling of the exact Born probabilities associated with the blue bars, with a sample size of $N_\text{T}=1.4\times 10^4$. That is, the orange weights are empirical, while the blue weights are exact. The green bars also represent empirical weights, but this time take a sample size $N_\text{N}=1.4\times 10^6$ from the ARNN. However, the ARNN itself is trained on only $N_\text{T}$ samples of the GS.
  • Figure 5: Ground state energy error $\Delta E$vs iteration number obtained for the $\text{C}_2\text{H}_2$ molecule using the STO-3G basis set, with displayed molecular geometry the same as that from Ref. Joseph2010. The solid/faded, green curve corresponds to the raw/temperature-scaled CISD-initiated curve. When sampling from the exact ground state, we consider four cases where the number of shots differ by an order of magnitude. We take network sample size $N_\text{N}=1.4\times 10^6$ and subspace dimension $N_\text{U}=1600$ in all curves. The horizontal dashed lines in all but the bottom two curves indicate a switch in the network architecture, where the number of layers and features are doubled, and $N_\text{T}$ increases from $10^4$ to $10^5$. Only the larger model is used in the bottom two curves. Additional hyper-parameters: two masked-dense layers, four features per bit, and dropout rate 0.05 in the smaller model, each of which are doubled in the larger model. ADAM learning rate 0.001. The temperature-scaled CISD curve takes $\beta=0.4$ for the first iteration. HF takes $\beta=0.1,0.6$ for the first two iterations.
  • ...and 8 more figures