Table of Contents
Fetching ...

Entanglement and optimization within autoregressive neural quantum states

Andrew Jreissaty, Hang Zhang, Jairo C. Quijano, Juan Carrasquilla, Roeland Wiersema

TL;DR

This work analyzes how autoregressive neural quantum states—specifically recurrent neural networks and autoregressive transformers—encode entanglement in strongly correlated spin systems. By introducing a square-modulus normalization to replace the conventional softmax, the authors reveal entanglement transitions and spectral statistics that resemble thermal and MBL-like regimes, and show that entanglement can be tuned via hyperparameters such as the hidden/embedding dimension and the Gaussian width of parameter initialization. They construct entanglement phase diagrams, quantify scaling with system size, and connect these properties to variational Monte Carlo performance for finding ground states of models like the TFIM and Heisenberg chain, uncovering optimal initialization strategies. The findings offer a path to more effective ground-state searches and provide insight into the expressive power and phase structure of autoregressive NQS, with implications for simulating highly entangled quantum matter. The work also demonstrates that Circulant attention in ATFs and MOD normalization can preserve or enhance entanglement, informing future design of NQS architectures for quantum many-body problems.

Abstract

Neural quantum states (NQSs) are powerful variational ansätze capable of representing highly entangled quantum many-body wavefunctions. While the average entanglement properties of ensembles of restricted Boltzmann machines are well understood, the entanglement structure of autoregressive NQSs such as recurrent neural networks and transformers remains largely unexplored. We perform large-scale simulations of ensembles of random autoregressive wavefunctions for chains of up to $256$ spins and uncover signatures of transitions in their average entanglement scaling, entanglement spectra, and correlation functions. We show that the standard softmax normalization of the wavefunction suppresses entanglement and fluctuations, and introduce a square modulus normalization function that restores them. Finally, we connect the insights gained from our entanglement and activation function analysis to initialization strategies for finding the ground states of strongly correlated Hamiltonians via variational Monte Carlo.

Entanglement and optimization within autoregressive neural quantum states

TL;DR

This work analyzes how autoregressive neural quantum states—specifically recurrent neural networks and autoregressive transformers—encode entanglement in strongly correlated spin systems. By introducing a square-modulus normalization to replace the conventional softmax, the authors reveal entanglement transitions and spectral statistics that resemble thermal and MBL-like regimes, and show that entanglement can be tuned via hyperparameters such as the hidden/embedding dimension and the Gaussian width of parameter initialization. They construct entanglement phase diagrams, quantify scaling with system size, and connect these properties to variational Monte Carlo performance for finding ground states of models like the TFIM and Heisenberg chain, uncovering optimal initialization strategies. The findings offer a path to more effective ground-state searches and provide insight into the expressive power and phase structure of autoregressive NQS, with implications for simulating highly entangled quantum matter. The work also demonstrates that Circulant attention in ATFs and MOD normalization can preserve or enhance entanglement, informing future design of NQS architectures for quantum many-body problems.

Abstract

Neural quantum states (NQSs) are powerful variational ansätze capable of representing highly entangled quantum many-body wavefunctions. While the average entanglement properties of ensembles of restricted Boltzmann machines are well understood, the entanglement structure of autoregressive NQSs such as recurrent neural networks and transformers remains largely unexplored. We perform large-scale simulations of ensembles of random autoregressive wavefunctions for chains of up to spins and uncover signatures of transitions in their average entanglement scaling, entanglement spectra, and correlation functions. We show that the standard softmax normalization of the wavefunction suppresses entanglement and fluctuations, and introduce a square modulus normalization function that restores them. Finally, we connect the insights gained from our entanglement and activation function analysis to initialization strategies for finding the ground states of strongly correlated Hamiltonians via variational Monte Carlo.

Paper Structure

This paper contains 30 sections, 47 equations, 18 figures, 3 tables.

Figures (18)

  • Figure 1: (a) RNN and (b) ATF architectures. (a) The RNN cell (green) applies a generally nonlinear activation function $f$ to a linearly transformed hidden state $\boldsymbol{h}_{n-1}$ and input vector $\boldsymbol{\sigma}_{n-1}$ (Eq. (\ref{['eq:equation_RNNcell']})). The new hidden state is transformed in the output layer to produce the conditional probability (blue) and phase (gray) vectors, as per Eqs. (\ref{['eq:conditionals_RNNoutputLayer']}) and (\ref{['eq:phaseComponents_RNNoutputLayer']}). (b) The ATF directly accepts as input all previously sampled spins $\boldsymbol{\sigma}_1, \boldsymbol{\sigma}_2, \ldots, \boldsymbol{\sigma}_{n-1}$. Each spin is embedded separately into a $d_\text{emb}$-dimensional space, and from there undergoes the transformations detailed in Sec. \ref{['subsec:Architectures_Methods_ATF']} to produce the final conditional probability (blue) and phase (gray) vectors. We note that $\boldsymbol{r}_1$ denotes the argument of $f$ in Eq. (\ref{['eq:equation_RNNcell']}), while $\boldsymbol{r}_2$ and $\boldsymbol{r}_3$ denote the arguments of the output layer activation functions of the RNN and ATF.
  • Figure 2: Average bipartite entanglement entropy divided by the maximal entropy $\overline{S_2^A}/ \left(L_A\log{2}\right)$ across $N_\text{init}$ random $20$-spin RNN wavefunctions as a function of the hidden unit dimension $d_h$ and the Gaussian distribution width $\sigma$. The different panels explore various combinations of the activation functions $f$ and $g$ (as per Eqs. (\ref{['eq:activation_function_f']}) and (\ref{['eq:activation_function_g']})), namely (A) $f=\tanh$, $g=\text{SM}$, (B) $f=\mathbb{1}$, $g=\text{SM}$, (C) $f=\tanh$, $g=\text{MOD}$ and (D) $f=\tanh$, $g=\mathbb{1}$. The wavefunctions are complex in the first row (A1-D1) and positive in the second row (A2-D2), while the averages are taken over $N_\text{init}=100$ random states for the autoregressive architectures (A-C) ($10^5$ samples used for the MC estimation), and $N_\text{init}=20$ otherwise (D) (96000 samples used here). In the presence of the softmax, adding a hyperbolic tangent in the fully connected linear on top of the linear transformation induces a richer structure of entanglement entropy across the random wavefunction ensemble, while a comparison of (A1) with (C1) and (D1) highlights the suppression of entanglement induced by the softmax in the large $\sigma$ regime for most values of $d_h > 0$.
  • Figure 3: Ratio of the average entanglement entropy and the maximal entropy $\overline{S_2^A}/ \left(L_A\log{2}\right)$ as a function of $\sigma$ at $d_h=40$ for the four complex RNN wavefunction architectures of Fig. \ref{['fig:EntangPhaseDiagrams_RNN']}. The error bars correspond to the variance of the entropies in the random ensemble, and do not include the minimal-by-comparison RNN sampling errors induced by the Monte Carlo estimation.
  • Figure 4: Average bipartite entanglement entropy $\overline{S_2^A}$ as a function of system size $L$ for (a) $(f,g)=(\mathbb{1},\text{SM})$ and (b) $(f,g)=(\tanh,\text{SM})$ complex random RNN wavefunction ensembles, at various values of $\sigma\in[0.1,10.1]$ and fixed $d_h = 40$. In (a), $2\times10^4$ Monte Carlo samples are used for the estimation, while $10^5$ samples are used in (b). Each data point represents an average over $N_\text{init}=400$ states. The dashed curves represent curves of best fit described by $S_\text{fit}(L)=aL^\nu + b\log L+c$, with the fitted parameters displayed in Tab. \ref{['table:RNN_scaling_values']} in App. \ref{['appendix:entang_scaling_parameter_values']}. The error bars shown are the errors associated with the random ensemble, with the RNN sampling errors negligible by comparison for the data points we include. We discard points in (b) for which the sampling error starts to dominate, which occurs at larger entropies and large $L$, causing the average entropy values to blow up and produce unphysical results. The inset in (b) zooms into the results at $L\leq32$ (and fits $S_\text{fit}$ to only those data points) and shows that if we had restricted our analysis to smaller systems, we may have ended up making confident but insufficiently backed claims about volume laws for $\sigma \lesssim 0.6$.
  • Figure 5: Entanglement spectrum level statistics presented in the form of the distribution $P(r)$ of the ratio of adjacent energy gaps for $20$-spin random complex RNN wavefunction ensembles with (a) $(f,g) = (\tanh, \text{SM})$, (b) $(f,g) = (\mathbb{1}, \text{SM})$ and (c) $(f,g) = (\tanh, \text{MOD})$. Each ensemble consists of $5000$ random states, and the reduced density matrices are diagonalized exactly. Only eigenvalues of $\hat{\rho}_A$ satisfying $\lambda^n_{\rho_A}\geq10^{-10}$ were included in the calculation, to avoid possible numerical precision issues brought about by even smaller values. The GOE, GUE, GSE, Poisson and semi-Poisson distributions of random matrix theory AtasLevelStats2013 are plotted for comparison. The RNN produces GUE statistics in the peak regions of entanglement entropy ($\sigma \in \sim [0.2,0.7]$ in (a), $\sigma \geq \sim 0.2$ in (c)), while the statistics tend to Poisson as we move away from those regions. In the inset of (a), the average eigenvalues of the reduced density matrices corresponding to the six wavefunction ensembles that reproduce GUE statistics in (a) ($\sigma=0.2,0.3,...,0.7$) are displayed alongside the Marchenko-Pastur distribution of random matrix theory MarchenkoPastur1967.
  • ...and 13 more figures