Table of Contents
Fetching ...

Discovering highly efficient low-weight quantum error-correcting codes with reinforcement learning

Austin Yubo He, Zi-Wen Liu

TL;DR

This work tackles the practical challenge of designing fault-tolerant quantum codes by minimizing the check weight $w$ in quantum LDPC codes while preserving distance $d$ and rate. It introduces a reinforcement learning framework that operates on Tanner graphs, using PPO with action masking to iteratively add or remove edges and reward reductions in degree while safeguarding $d$; the distance evaluation is combined with a normalization scheme to stabilize learning. Applied to hypergraph product codes built from classical codes with $n\le 30$ (the HGP-30 set), the approach discovers a large set of new low-weight codes with distances extending into the tens, notably achieving substantial reductions in physical qubit overhead. Compared with prior weight-reduction methods and recent RL-based code designs, the RL framework yields overhead reductions up to about $\times 73$ and distances up to $d\approx 40$, making near-term experimental implementations more feasible and enabling scalable exploration of code parameters. The results demonstrate that reinforcement learning can effectively advance quantum code discovery in finite-size regimes and offer a flexible framework for future refinements, including incorporating spectral-gap rewards and adapting to other code families and experimental constraints.

Abstract

The realization of scalable fault-tolerant quantum computing is expected to hinge on quantum error-correcting codes. In the quest for more efficient quantum fault tolerance, a critical code parameter is the weight of measurements that extract information about errors to enable error correction: as higher measurement weights require higher implementation costs and introduce more errors, it is important in code design to optimize measurement weight. This underlies the surging interest in quantum low-density parity-check (qLDPC) codes, the study of which has primarily focused on the asymptotic (large-code-limit) properties. In this work, we introduce a versatile and computationally efficient approach to stabilizer code weight reduction based on reinforcement learning (RL), which produces new low-weight codes that substantially outperform the state of the art in practically relevant parameter regimes, extending significantly beyond previously accessible small distances. For example, our approach demonstrates savings in physical qubit overhead compared to existing results by 1 to 2 orders of magnitude for weight 6 codes and brings the overhead into a feasible range for near-future experiments. We also investigate the interplay between code parameters using our RL framework, offering new insights into the potential efficiency and power of practically viable coding strategies. Overall, our results demonstrate how RL can effectively advance the crucial yet challenging problem of quantum code discovery and thereby facilitate a faster path to the practical implementation of fault-tolerant quantum technologies.

Discovering highly efficient low-weight quantum error-correcting codes with reinforcement learning

TL;DR

This work tackles the practical challenge of designing fault-tolerant quantum codes by minimizing the check weight in quantum LDPC codes while preserving distance and rate. It introduces a reinforcement learning framework that operates on Tanner graphs, using PPO with action masking to iteratively add or remove edges and reward reductions in degree while safeguarding ; the distance evaluation is combined with a normalization scheme to stabilize learning. Applied to hypergraph product codes built from classical codes with (the HGP-30 set), the approach discovers a large set of new low-weight codes with distances extending into the tens, notably achieving substantial reductions in physical qubit overhead. Compared with prior weight-reduction methods and recent RL-based code designs, the RL framework yields overhead reductions up to about and distances up to , making near-term experimental implementations more feasible and enabling scalable exploration of code parameters. The results demonstrate that reinforcement learning can effectively advance quantum code discovery in finite-size regimes and offer a flexible framework for future refinements, including incorporating spectral-gap rewards and adapting to other code families and experimental constraints.

Abstract

The realization of scalable fault-tolerant quantum computing is expected to hinge on quantum error-correcting codes. In the quest for more efficient quantum fault tolerance, a critical code parameter is the weight of measurements that extract information about errors to enable error correction: as higher measurement weights require higher implementation costs and introduce more errors, it is important in code design to optimize measurement weight. This underlies the surging interest in quantum low-density parity-check (qLDPC) codes, the study of which has primarily focused on the asymptotic (large-code-limit) properties. In this work, we introduce a versatile and computationally efficient approach to stabilizer code weight reduction based on reinforcement learning (RL), which produces new low-weight codes that substantially outperform the state of the art in practically relevant parameter regimes, extending significantly beyond previously accessible small distances. For example, our approach demonstrates savings in physical qubit overhead compared to existing results by 1 to 2 orders of magnitude for weight 6 codes and brings the overhead into a feasible range for near-future experiments. We also investigate the interplay between code parameters using our RL framework, offering new insights into the potential efficiency and power of practically viable coding strategies. Overall, our results demonstrate how RL can effectively advance the crucial yet challenging problem of quantum code discovery and thereby facilitate a faster path to the practical implementation of fault-tolerant quantum technologies.

Paper Structure

This paper contains 10 sections, 6 equations, 14 figures, 4 tables.

Figures (14)

  • Figure 1: An illustration of our RL scheme. The RL agent (left) maintains a policy network that, given the state of the Tanner graph, selects an action of adding or removing an edge. The environment (right) updates the graph accordingly and returns a reward based on the code’s new distance and weight. This reward signal is then used to update the policy network, guiding the agent toward better code designs.
  • Figure 2: Reinforcement learning-driven code design. (a) Training trajectories of codes with varying parameters averaged over 3 runs. (b) Evolution of parameters in the three example codes throughout a single episode.(c) Exploration of 10 episodes (represented by different colors) over PCA decomposition of state space.
  • Figure 3: Parallel coordinates plot comparing hypergraph product base codes (blue) and RL-optimized codes (red) after weight reduction. For each color, 475 codes (including 10 high-distance ones beyond the HGP-30 regime) with varying parameters are shown. Each vertical axis is normalized to the maximum observed value for that parameter, and each line traces a single code’s parameters across all axes.
  • Figure 4: Comparisons of codes discovered by our RL-based scheme and existing weight reduction methods. (top) Comparison with Hastings hastings_quantum_2023 (data taken from Ref. sabo_weight-reduced_2024) and SOTA results from Sabo et al. sabo_weight-reduced_2024. (bottom) Comparisons with SOTA results on all hypergraph product codes constructed from $n\leq30$ classical codes. Explicit code parameters are shown in Table \ref{['tab:qldpc-subscript']}, \ref{['tab:qldpc-comparison']}.
  • Figure 5: Breakdown of overhead factors shown by heatmaps at varying $n$, $k$, $d$ parameters. The top and bottom rows correspond to codes discovered by our RL weight-reduction scheme and Sabo et al.'s method, respectively. Gradients are binned for ease of visualization and not exact representations of overhead factors, as seen in the varying scales.
  • ...and 9 more figures