Attention to Quantum Complexity

Hyejin Kim; Yiqing Zhou; Yichen Xu; Kaarthik Varma; Amir H. Karamlou; Ilan T. Rosen; Jesse C. Hoke; Chao Wan; Jin Peng Zhou; William D. Oliver; Yuri D. Lensky; Kilian Q. Weinberger; Eun-Ah Kim

Attention to Quantum Complexity

Hyejin Kim, Yiqing Zhou, Yichen Xu, Kaarthik Varma, Amir H. Karamlou, Ilan T. Rosen, Jesse C. Hoke, Chao Wan, Jin Peng Zhou, William D. Oliver, Yuri D. Lensky, Kilian Q. Weinberger, Eun-Ah Kim

TL;DR

The Quantum Attention Network (QuAN), a classical artificial intelligence framework leveraging attention mechanisms tailored for learning quantum complexity, is introduced, which directly learns entanglement and state complexity growth from experimental computational basis measurements, including complexity growth in random circuits from noisy data.

Abstract

The imminent era of error-corrected quantum computing urgently demands robust methods to characterize complex quantum states, even from limited and noisy measurements. We introduce the Quantum Attention Network (QuAN), a versatile classical AI framework leveraging the power of attention mechanisms specifically tailored to address the unique challenges of learning quantum complexity. Inspired by large language models, QuAN treats measurement snapshots as tokens while respecting their permutation invariance. Combined with a novel parameter-efficient mini-set self-attention block (MSSAB), such data structure enables QuAN to access high-order moments of the bit-string distribution and preferentially attend to less noisy snapshots. We rigorously test QuAN across three distinct quantum simulation settings: driven hard-core Bose-Hubbard model, random quantum circuits, and the toric code under coherent and incoherent noise. QuAN directly learns the growth in entanglement and state complexity from experimentally obtained computational basis measurements. In particular, it learns the growth in complexity of random circuit data upon increasing depth from noisy experimental data. Taken to a regime inaccessible by existing theory, QuAN unveils the complete phase diagram for noisy toric code data as a function of both noise types. This breakthrough highlights the transformative potential of using purposefully designed AI-driven solutions to assist quantum hardware.

Attention to Quantum Complexity

TL;DR

Abstract

Paper Structure (8 equations, 4 figures)

This paper contains 8 equations, 4 figures.

Figures (4)

Figure 1: Learning relative complexity between states $\rho_\alpha$ and $\rho_\beta$ from bitstring collections.(a) Measurements of a quantum state $\rho$ samples bit-strings $\{B_i\}$ from bit-string probability distribution $p(\{b_i\}|\rho)$ over the $2^{N_q}$-dimensional Hilbert space. (b) The schematic architecture of QuAN. $Z$-basis snapshot collection of size $M$ is partitioned into sets $\{\mathbb{X}_i\}$ of size $N$. In the encoder stage, after convolution registers positions of qubits, the set goes through $L$ layers of MSSAB. Inside MSSAB, the input is further partitioned into $N_s$ mini-sets to be parallel processed through SABs, recurrent attention block (RecAB), and reducing attention block (RedAB). The decoder stage compresses output from the encoder, allowing for attending to different components in a permutation-invariant manner, using a PAB and single-layer perception (SLP). The output label is $y=1$ for state $\rho_\alpha$ and $y=0$ for state $\rho_\beta$. See SM section A for more details. (c-e) Examples of $\rho_\alpha$ and $\rho_\beta$ for learning relative complexity using binary classification output of QuAN. (c) A volume-law entangled state v.s. an area-law entangled state. The entanglement between two subsystems (white and grey) indicated through blue links. (d) A random circuit state at depth $d$ v.s. that at some deep reference depth. bitstrings shallow and deep circuit states (e) Decodable v.s. undecodable states of an error-correcting code under noise. Incoherent noise depicted in grey suppresses large loops.
Figure 2: Relative complexity between volume-law and area-law scaling states.(a) Inter-snapshot correlation reveals $X$-$X$ correlation of the quantum state. The purple box shows the schematic of the self-attention block capturing the inter-snapshot correlation. (b) A schematic diagram of the 16-transmon-qubit chip used for quantum emulation of the driven hard-core Boson-Hubbard model. (c) The entanglement transition based on scaling of bipartite entanglement entropy $S = S_A A + S_V V$, where $A$ and $V$ represent the area and volume of the subsystem, respectively. Adapted from ref. Karamlou2023. (d) A schematic of a contrast architecture: the set-multi-layer-perceptron (SMLP) respects the permutation symmetry. (e-g) The average confidence $\bar{y}$ as a function of detuning strength $\delta$ for different architectures, using different set size $N$. The star symbol marks the training points. The average and errors are obtained from $10$ independent model training. For machine learning details, see SM section C2. (e) SMLP fails to train. (f) QuAN$_2$ ($N_s=1$, $L=1$). (g) QuAN$_4$ with two layers of self-attention($N_s=1$, $L=2$).
Figure 3: Relative complexity between random circuit state at depth $d$ and the reference state at depth $d=20$.(a) Schematic illustration of the $6 \times 6$ subarray of qubits from Google's "Sycamore" quantum processor. A random circuit of depth $d$ alternates entangling iSWAP-like gates (grey) and single qubit (SQ) gates randombly chosen from the set $\{\sqrt{X^{\pm1}}, \sqrt{Y^{\pm1}}, \sqrt{W^{\pm1}}, \sqrt{V^{\pm1}}\}$, with $W=(X+Y)/\sqrt{2}$ and $V=(X-Y)/\sqrt{2}$. The two-qubit gates are applied in a repeating series of ABCDCDAB patterns. (b) The data structure. For each depth $d$, we sample $N_c=50$ circuits. For each circuit instance $s$, we sample $M_s$ bit-strings bitstringson them into sets of size $N$, resulting in a total of $N_c\times M_s / N$ sets for each circuit depth $d$. (c) XEB (Eq. \ref{['eq:XEB']}) for bit-strings bitstringsm noiseless simulations, as a function of circuit depth $d$ with varying system sizes $N_q$. The markers show the averaged XEB over $N_c=50$ different circuit instances and the error bars for the standard errors. (d) The pure-state trained QuAN$_{50}$'s classification accuracy for pure-state data. We train $8$ independent models at each circuit depth $d$ and show the averaged accuracy (marker) and the standard error (error bar). QuAN$_{50}$ successfully learns the relative complexity of $d=8$. (e) A comparison of the performances of QuAN$_2$, QuAN$_{50}$ and other architectures in learning the relative complexity of depth $d=8$ on a $N_q=25$ qubit system. (f) Averaged XEB for experimentally collected bit-strings.bitstringsshow averaged XEB over $50$ circuit instances (markers) and the standard error (error bars). The XEB smoothly decays as a function of depth $d$. (g) Learning relative complexity from experimental data using QuAN$_{50}$ trained on noiseless data.
Figure 4: Learning the relative complexity of decodable and undecodable states of the toric code.(a) The transformation from the $Z$-basis measurements to the smallest-loop, plaquette variables. (b) QuAN can build larger closed loops through multiplication. (c,d) The decodability phase diagram of the toric code state under coherent and incoherent noise for two different set sizes: $N=1$ in (c) and $N=64$ in (d). The regions in the phase space that support the training data are marked with hatch marks. The average confidence $\bar{y}$ averages over $10$ independent model training. The known thresholds are marked along the $g_X=0$ axis at $p_c\approx 0.11$ and along the $p_{\rm flip}=0$ at $g_c\approx 0.22$. (e) Average confidence $\bar{y}$ by QuAN$_2$ for different set sizes $N$, and by SMLP with $N=64$, along the axis $g_X=0$. The error bar shows the standard error for $\bar{y}$ over $10$ independent model training. (f) Average confidence $\bar{y}$ by QuAN$_2$ with varying set sizes $N$, and by SMLP with $N=64$, along the axis $p_\text{flip}=0$. (g) Average confidence $\bar{y}$ by QuAN$_2$ and PAB with $N=64$ along the axis $g_X=0$, where PAB is defined as the model without self-attention and has only pooling attention. (h) Average confidence $\bar{y}$ by QuAN$_2$ and PAB with $N=64$ along the axis $p_\text{flip}=0$. (i) Pooling attention score histogram from the topological state with $(g_X, p_\text{flip})=(0, 0.05)$. (j) The loop expectation value $\langle Z_\text{closed}\rangle$ as a function of the loop perimeter, for high and low attention score snapshots in the topological state with $(g_X, p_\text{flip})=(0, 0.05)$. The error bars represent the standard error of $\langle Z_\text{closed}\rangle$ over different loop configurations in corresponding snapshots.