Discovering highly efficient low-weight quantum error-correcting codes with reinforcement learning
Austin Yubo He, Zi-Wen Liu
TL;DR
This work tackles the practical challenge of designing fault-tolerant quantum codes by minimizing the check weight $w$ in quantum LDPC codes while preserving distance $d$ and rate. It introduces a reinforcement learning framework that operates on Tanner graphs, using PPO with action masking to iteratively add or remove edges and reward reductions in degree while safeguarding $d$; the distance evaluation is combined with a normalization scheme to stabilize learning. Applied to hypergraph product codes built from classical codes with $n\le 30$ (the HGP-30 set), the approach discovers a large set of new low-weight codes with distances extending into the tens, notably achieving substantial reductions in physical qubit overhead. Compared with prior weight-reduction methods and recent RL-based code designs, the RL framework yields overhead reductions up to about $\times 73$ and distances up to $d\approx 40$, making near-term experimental implementations more feasible and enabling scalable exploration of code parameters. The results demonstrate that reinforcement learning can effectively advance quantum code discovery in finite-size regimes and offer a flexible framework for future refinements, including incorporating spectral-gap rewards and adapting to other code families and experimental constraints.
Abstract
The realization of scalable fault-tolerant quantum computing is expected to hinge on quantum error-correcting codes. In the quest for more efficient quantum fault tolerance, a critical code parameter is the weight of measurements that extract information about errors to enable error correction: as higher measurement weights require higher implementation costs and introduce more errors, it is important in code design to optimize measurement weight. This underlies the surging interest in quantum low-density parity-check (qLDPC) codes, the study of which has primarily focused on the asymptotic (large-code-limit) properties. In this work, we introduce a versatile and computationally efficient approach to stabilizer code weight reduction based on reinforcement learning (RL), which produces new low-weight codes that substantially outperform the state of the art in practically relevant parameter regimes, extending significantly beyond previously accessible small distances. For example, our approach demonstrates savings in physical qubit overhead compared to existing results by 1 to 2 orders of magnitude for weight 6 codes and brings the overhead into a feasible range for near-future experiments. We also investigate the interplay between code parameters using our RL framework, offering new insights into the potential efficiency and power of practically viable coding strategies. Overall, our results demonstrate how RL can effectively advance the crucial yet challenging problem of quantum code discovery and thereby facilitate a faster path to the practical implementation of fault-tolerant quantum technologies.
