Simultaneous Discovery of Quantum Error Correction Codes and Encoders with a Noise-Aware Reinforcement Learning Agent

Jan Olle; Remmy Zen; Matteo Puviani; Florian Marquardt

Simultaneous Discovery of Quantum Error Correction Codes and Encoders with a Noise-Aware Reinforcement Learning Agent

Jan Olle, Remmy Zen, Matteo Puviani, Florian Marquardt

TL;DR

This work tackles the challenge of hardware-aware quantum error correction by automatically discovering both stabilizer codes and their encoding circuits with a noise-aware reinforcement learning agent. It introduces a KL-based reward and a vectorized Clifford simulator to scale code discovery to up to ${[[n,k,d]]} = [[20,13,3]]$ and ${[[11,1,5]]}$, while a noise-aware meta-agent generalizes strategies across asymmetric depolarizing channels using the bias parameter $c_Z$. The approach leverages the stabilizer formalism to enable fast Clifford simulations and enables simultaneous discovery across multiple noise models, including a CSS-focused extension that substantially reduces memory needs and broadens scalability. The results recover known codes, reveal diverse code families, and demonstrate transfer learning between noise settings, with practical pathways toward hardware-adapted accelerated discovery of QEC on a broad range of platforms. The work lays a foundation for scalable, hardware-tuned QEC code search and encoding synthesis, and points to fault-tolerant extensions and larger-scale CSS-based explorations as promising future directions.

Abstract

In the ongoing race towards experimental implementations of quantum error correction (QEC), finding ways to automatically discover codes and encoding strategies tailored to the qubit hardware platform is emerging as a critical problem. Reinforcement learning (RL) has been identified as a promising approach, but so far it has been severely restricted in terms of scalability. In this work, we significantly expand the power of RL approaches to QEC code discovery. Explicitly, we train an RL agent that automatically discovers both QEC codes and their encoding circuits for a given gate set, qubit connectivity and error model, from scratch. This is enabled by a reward based on the Knill-Laflamme conditions and a vectorized Clifford simulator, allowing us to scale our results to 20 physical qubits and distance 5 codes. Moreover, we introduce the concept of a noise-aware meta-agent, which learns to produce encoding strategies simultaneously for a range of noise models, thus leveraging transfer of insights between different situations. Our approach opens the door towards hardware-adapted accelerated discovery of QEC approaches across the full spectrum of quantum hardware platforms of interest.

Simultaneous Discovery of Quantum Error Correction Codes and Encoders with a Noise-Aware Reinforcement Learning Agent

TL;DR

and

, while a noise-aware meta-agent generalizes strategies across asymmetric depolarizing channels using the bias parameter

. The approach leverages the stabilizer formalism to enable fast Clifford simulations and enables simultaneous discovery across multiple noise models, including a CSS-focused extension that substantially reduces memory needs and broadens scalability. The results recover known codes, reveal diverse code families, and demonstrate transfer learning between noise settings, with practical pathways toward hardware-adapted accelerated discovery of QEC on a broad range of platforms. The work lays a foundation for scalable, hardware-tuned QEC code search and encoding synthesis, and points to fault-tolerant extensions and larger-scale CSS-based explorations as promising future directions.

Abstract

Paper Structure (28 sections, 38 equations, 15 figures, 1 table)

This paper contains 28 sections, 38 equations, 15 figures, 1 table.

Introduction
Background
Stabilizer Codes
Code Classification
Reinforcement Learning
Reinforcement Learning Approach to QEC Code Discovery
Encoding Circuit
Reward
Noise-aware meta-agent
Vectorized Clifford simulator
Results
Codes in a symmetric depolarizing noise channel
Noise-aware meta-agent
Towards large code discovery
Conclusions and Outlook
...and 13 more sections

Figures (15)

Figure 1: QEC code and encoding discovery using a noise-aware RL meta-agent. A set of error operators, a gate set and qubit connectivity are chosen. Different error models can be considered by varying some noise parameters, which are fed as an observation to the agent. The agent then builds a circuit using the available gate set and connectivity that detects the most likely errors from the target error model by using a reward based on the Knill-Laflamme QEC conditions according to Eq. \ref{['eq:reward']}. After training, a single RL agent is able to find suitable encodings for different noise models, which are able to encode any state $|\psi\rangle$ of choice.
Figure 2: Discovering codes and encoding circuits for various numbers of physical qubits, logical qubits, and distances (see main text and Appendix \ref{['appendix:distance_5']} for $d=5$). Families of stabilizer codes tailored to symmetric depolarizing noise channels, found with our RL framework. The labels $(x,y)$ indicate the number of non-degenerate $(x)$ and degenerate $(y)$ code families. The circuit size shown is the absolute minimum throughout all families, and different families in general have different minimal circuit sizes. Since further training runs do not increase family populations, it is likely that there are no more stabilizer codes for the shown code parameters.
Figure 3: Influence of connectivity. Characteristics of the 13 families of ${[[9,3,3]]}$ codes found with our framework, clustered according to families distinguished by their quantum weight enumerators \ref{['eq:quantum-weight-enumerators']}. Families 9 and 13 (*) are degenerate, while the rest are non-degenerate. We have trained a total of 10240 agents for each of both cases. In the all-to-all (directed: $\text{CNOT}(i<j)$) connectivity, 9574 agents were successful, while this number went down to 3808 in the other case. The bars display how these codes are distributed across different families. Codes in the same family found by different agents are not necessarily distinct, so the bars are rather an indication of the likelihood of a training run to find a code within the family. The points show the mean circuit size, averaged within each family, while the error bar is its standard deviation. It is interesting to see that even with different connectivities, families occur with similar likelihoods during training. We explicitly list the corresponding quantum weight enumerators computed with \ref{['eq:quantum-weight-enumerators']} in Appendix \ref{['appendix:QWE_9_3_3']}.
Figure 4: Performance of the noise-aware RL agent. The agent finds $n=9, k=1$ codes and encoding circuits, simultaneously for different levels of noise bias $c_Z$, with single-qubit fidelity $p_I=0.9$. In panels a,b,c, green represents the agent that was post-selected among all trained agents for performing best at minimizing the weighted KL sum, averaged over all $c_Z$ values. Orange refers to the agent minimizing the failure probability, averaged over $c_Z$. a Weighted KL sum as a function of the noise bias parameter $c_Z$ (best agent: green line). b Failure probability as a function of the noise bias parameter $c_Z$ (best agent: orange line) c Smallest undetected effective weight (effective code distance is the integer part) as a function of the noise bias parameter $c_Z$. While there is almost a perfect overlap between both best agents until $c_Z=1.1$, the situation changes afterwards, leading at $c_Z=2$ to a $d_e=5$ code (green) or a $d_e=4$ code (orange) that perform equally well in terms of the failure probability, as seen in b. d Evaluation of the failure-probability of the best-performing agent (orange in the other panels) for larger values of $p_I$ (smaller errors) than the ones it was trained on.
Figure 5: Characteristics of the 9-qubit codes and encodings found by the noise-aware meta-agent post-selected for minimizing the failure probability. a Encoding circuits: Here we see that many small gate sequences (highlighted with different colors) are reused across different values of $c_Z$. This is an indication of transfer learning, i.e. the power of the meta-agent. b Code generators $g_i$ corresponding to the encoding circuits, where we do not make a distinction between $X$ or $Y$. Here we see that the code generators $g_i$ vary across different values of $c_Z$. c Associated code family according to their (symmetric) weight enumerators $A$, $B$. The same code family is used from $0.5 \leq c_Z < 0.9$, while a family switching occurs at $c_Z=0.9$, and it is kept until $c_Z=2$.
...and 10 more figures

Simultaneous Discovery of Quantum Error Correction Codes and Encoders with a Noise-Aware Reinforcement Learning Agent

TL;DR

Abstract

Simultaneous Discovery of Quantum Error Correction Codes and Encoders with a Noise-Aware Reinforcement Learning Agent

Authors

TL;DR

Abstract

Table of Contents

Figures (15)