Information-Theoretic Scaling Laws of Neural Quantum States

Yiming Lu; Sriram Bharadwaj; Dikshant Rathore; Di Luo

Information-Theoretic Scaling Laws of Neural Quantum States

Yiming Lu, Sriram Bharadwaj, Dikshant Rathore, Di Luo

Abstract

We establish an information-theoretic scaling law for generic autoregressive neural quantum states, determined by the middle-cut mutual information of the wavefunction amplitude. By formalizing the virtual bond as an effective information channel across a sequence bipartition, we rigorously prove that exact autoregressive representation of a quantum state requires the virtual-bond dimension to scale with the amplitude mutual information. For stabilizer-state families, we show that this law yields an explicit, analytical rank formula. Applying this framework across quantum-state tomography, ground-state and finite-temperature learning, our numerical experiments expose precise exponent matching, architecture-dependent scaling differences between recurrent and Transformer neural quantum state, and the critical role of autoregressive basis ordering. These results establish a rigorous physical link between the intrinsic structure of a quantum many-body state and the corresponding neural-network capacity required for its faithful representation.

Information-Theoretic Scaling Laws of Neural Quantum States

Abstract

Paper Structure (26 sections, 5 theorems, 209 equations, 7 figures)

This paper contains 26 sections, 5 theorems, 209 equations, 7 figures.

Virtual-Bond Definition and Architecture Mapping
Proof of the VB Scaling Law
Finite precision assumption
Lipschitz continuity assumption
CMI for stabilizer states in the $Z$ basis
Checkerboard stabilizer family with tunable middle-cut amplitude complexity
Lattice and five-body parity checks
Uniformly distributed check centers
The associated CSS stabilizer state
Scaling of the middle-cut CMI
Explicit CMI formula for the toric code
Edge-label order.
Even system size: $L=2k$ with $k\in\mathbb{Z}^+$.
Odd system size: $L=2k+1$ with $k\in\mathbb{Z}_{\ge 0}$.
Final formula.
...and 11 more sections

Key Result

Theorem 1

Consider a target family in a fixed basis $\mathcal{S}$, and let $m=\lfloor n/2\rfloor$ with $A=\bm s_{<m}$ and $B=\bm s_{\ge m}$. If an autoregressive neural quantum state represents the corresponding amplitude distributions exactly across system sizes, then

Figures (7)

Figure 1: Virtual-bond scaling law for ARNN-NQS. Left: in an ARNN-NQS, information from the prefix $\bm s_{\leq i}$ is passed across the cut through a virtual bond, its effective dimension $\gamma_i$ quantifies the cross-cut information retained by the model. Right: the scaling required of $\gamma_i$ is set by the amplitude complexity of the target family in the chosen basis. A basis permutation can possibly change the required virtual-bond dimension dramatically. For example, converting a target with large $\gamma_i$ scaling into one with smaller or even $O(1)$ scaling. The middle-cut mutual information of the quantum state amplitude directly determines the virtual-bond dimension required for faithful representation.
Figure 2: Application I: checkerboard-stabilizer tomography. Left, log-log plot of the half-cut CMI $\mathcal{I}(L)$ versus $L$ for tunable families with number of parity checks scales as $L^{2\gamma}$; fitted slopes are $0.670$, $0.702$, and $1.000$ for $\gamma=0.5,0.7,1.0$. Right, log-log plot of the minimal RNN hidden width $n_{d}^{\min}(L)$ required to reach $95\%$ fidelity; fitted slopes are $0.620$, $0.767$, and $1.057$. The close agreement between the two sets of exponents shows that the required RNN VB dimension tracks the scaling of the target amplitude complexity.
Figure 3: Application II: toric-code benchmark. Left, fidelity versus hidden width $n_d$ for RNN training on system sizes from $7\times7$ to $10\times10$. Right, minimal width $n_d^{\min}$ versus linear size $L$ at fixed target fidelity ($95\%$) for RNN and autoregressive Transformer models. The required RNN width grows with system size, whereas the Transformer width remains nearly constant over the tested range, consistent with the different scaling of virtual-bond dimension in recurrent and attention-based architectures.
Figure 4: Application III: finite-temperature representation at $J=1, h=0.6,\beta=0.1$, Left, plot of the half-cut CMI versus $n$ for two orderings of the doubled basis. The numerical method for the CMI calculation is shown in \ref{['app:TFD-fermion-amplitude']}. The separate ordering is nearly linear (fitted slope $0.997$, $\mathcal{I}\propto n^{0.997}$), whereas the alternate ordering remains small and nearly constant. Right, minimal RNN hidden width $n_d^{\min}$ at fixed target fidelity for the same two orderings. For the separate ordering, $n_d^{\min}$ grows with $n$ (fitted slope $1.205$, $n_d^{\min}\propto n^{1.205}$); for the alternate ordering, the target fidelity is reached with nearly constant hidden width over the tested range.
Figure S5: Virtual-bond realizations for the two autoregressive architectures used in numerics. Left panel, RNN: the recurrent update transmits information through $h_{i-1}\!\to h_i$, so the recurrent state is the virtual bond. Right panel, autoregressive Transformer: prefix information is carried by the cached key-value memory through $kv_{<i}\!\to kv_{\le i}$, so the cache is the virtual bond. The figure gives the model-specific map from theorem-level dimension $\gamma_i$ to architecture size.
...and 2 more figures

Theorems & Definitions (9)

Definition 1: ARNN-NQS Virtual Bond
Theorem 1: VB Scaling Law
Theorem 2: ARNN-NQS VB Scaling Law for Stabilizer State Representation
Lemma 1: Mutual information is bounded by ordinary rank
proof
Theorem 3: VB Scaling Law, Finite Precision Assumption
proof
Theorem 4: VB Scaling Law, Lipschitz Assumption
proof

Information-Theoretic Scaling Laws of Neural Quantum States

Abstract

Information-Theoretic Scaling Laws of Neural Quantum States

Authors

Abstract

Table of Contents

Key Result

Figures (7)

Theorems & Definitions (9)