Table of Contents
Fetching ...

Discovering autonomous quantum error correction via deep reinforcement learning

Yue Yin, Tailong Xiao, Xiaoyang Deng, Ming He, Jianping Fan, Guihua Zeng

TL;DR

This work addresses the challenge of discovering practical AQEC codes for bosonic systems under realistic loss channels by combining curriculum learning with deep reinforcement learning (DRL). It introduces a semi-analytical master-equation solver to accelerate training and proposes a two-phase curriculum that first identifies encodings exceeding the breakeven fidelity $F_{be}$ and then optimizes their long-term stability. The key finding is the GRL code with logical states $|0_L angle=|4 angle$ and $|1_L angle=|7 angle$, and an engineered Lindblad operator $L_{ ext{eng}}\propto|3 anglera2|+|4 anglera3|+|6 anglera5|+|7 anglera6|$, which maintains high fidelity even with double-photon loss, in agreement with KL-condition analysis for the error set $ ext{E}=ig\{I,ig( rac{}{}ig)igig brace$. The results show improved robustness to phase and amplitude damping, and feasible experimental implementation with short Hamiltonian distance ($d_g=1$), indicating strong potential for fault-tolerant quantum memories. Overall, curriculum-learning guided DRL offers a powerful framework for discovering adaptable, high-performance AQEC codes in early fault-tolerant quantum systems.

Abstract

Quantum error correction is essential for fault-tolerant quantum computing. However, standard methods relying on active measurements may introduce additional errors. Autonomous quantum error correction (AQEC) circumvents this by utilizing engineered dissipation and drives in bosonic systems, but identifying practical encoding remains challenging due to stringent Knill-Laflamme conditions. In this work, we utilize curriculum learning enabled deep reinforcement learning to discover Bosonic codes under approximate AQEC framework to resist both single-photon and double-photon losses. We present an analytical solution of solving the master equation under approximation conditions, which can significantly accelerate the training process of reinforcement learning. The agent first identifies an encoded subspace surpassing the breakeven point through rapid exploration within a constrained evolutionary time-frame, then strategically fine-tunes its policy to sustain this performance advantage over extended temporal horizons. We find that the two-phase trained agent can discover the optimal set of codewords, i.e., the Fock states $\ket{4}$ and $\ket{7}$ considering the effect of both single-photon and double-photon loss. We identify that the discovered code surpasses the breakeven threshold over a longer evolution time and achieve the state-of-art performance. We also analyze the robustness of the code against the phase damping and amplitude damping noise. Our work highlights the potential of curriculum learning enabled deep reinforcement learning in discovering the optimal quantum error correct code especially in early fault-tolerant quantum systems.

Discovering autonomous quantum error correction via deep reinforcement learning

TL;DR

This work addresses the challenge of discovering practical AQEC codes for bosonic systems under realistic loss channels by combining curriculum learning with deep reinforcement learning (DRL). It introduces a semi-analytical master-equation solver to accelerate training and proposes a two-phase curriculum that first identifies encodings exceeding the breakeven fidelity and then optimizes their long-term stability. The key finding is the GRL code with logical states and , and an engineered Lindblad operator , which maintains high fidelity even with double-photon loss, in agreement with KL-condition analysis for the error set . The results show improved robustness to phase and amplitude damping, and feasible experimental implementation with short Hamiltonian distance (), indicating strong potential for fault-tolerant quantum memories. Overall, curriculum-learning guided DRL offers a powerful framework for discovering adaptable, high-performance AQEC codes in early fault-tolerant quantum systems.

Abstract

Quantum error correction is essential for fault-tolerant quantum computing. However, standard methods relying on active measurements may introduce additional errors. Autonomous quantum error correction (AQEC) circumvents this by utilizing engineered dissipation and drives in bosonic systems, but identifying practical encoding remains challenging due to stringent Knill-Laflamme conditions. In this work, we utilize curriculum learning enabled deep reinforcement learning to discover Bosonic codes under approximate AQEC framework to resist both single-photon and double-photon losses. We present an analytical solution of solving the master equation under approximation conditions, which can significantly accelerate the training process of reinforcement learning. The agent first identifies an encoded subspace surpassing the breakeven point through rapid exploration within a constrained evolutionary time-frame, then strategically fine-tunes its policy to sustain this performance advantage over extended temporal horizons. We find that the two-phase trained agent can discover the optimal set of codewords, i.e., the Fock states and considering the effect of both single-photon and double-photon loss. We identify that the discovered code surpasses the breakeven threshold over a longer evolution time and achieve the state-of-art performance. We also analyze the robustness of the code against the phase damping and amplitude damping noise. Our work highlights the potential of curriculum learning enabled deep reinforcement learning in discovering the optimal quantum error correct code especially in early fault-tolerant quantum systems.

Paper Structure

This paper contains 13 sections, 43 equations, 12 figures, 2 tables, 1 algorithm.

Figures (12)

  • Figure 1: Schematic diagram of a typical AQEC system, including storage cavity $A$, transmon ancilla $q$ and readout $R$.
  • Figure 2: Hardware-module energy level diagram of approximate AQEC process. This diagram illustrates the physical "pump-and-dump" mechanism for a single recovery cycle. The labels 1-4 depict the error and recovery process of an AQEC code.
  • Figure 3: Definition of $\rho^{(m)}$ in the case of $N=3$
  • Figure 4: The diagram of training the PPO agent with the quantum environment. a The training process of GRL code in each episode. The training of the agent lasts for 300k episodes. In each episode, the quantum environment first creates the six basic quantum states $s_1$ according to the codeword generated by its action $a_1$ in the first step, and takes the $s_{t-1}$ as the initial states in the following steps. These states evolve over time $dt=0.06/\gamma_a$, the density matrices $\rho_i,(i=1,\cdots,6)$ and the codeword are used to compute the fidelity $f_t=(\text{Tr}[\rho_{1,0}\rho_{1,t}],\cdots,\text{Tr}[\rho_{6,0}\rho_{6,t}])$. Based on $o_t=(f_t,a_t,a_1,\gamma_b/\gamma_a, \gamma_{a2}/\gamma_a,g/\gamma_a)$, the results are fed back into the agent, which determines a new codeword $a_{t+1}$. This process continues until the evolution time $t=4.2/\gamma_a$ at which point a new episode begins. Throughout the training, the agent gets a reward $r_t$ every action it takes based on the fidelity achieved. The agent's objective is to maximize the reward, which in turn helps it converge towards the optimal codeword. b Comparison of RL performance with and without CL. Without CL, due to an excessively long simulation time $\gamma_at=4.2$, the initial steps' rewards diminish significantly because of the discount factor, making it challenging for the RL model to learn an initial density matrix that surpasses the breakeven fidelity. Consequently, RL fails to effectively enhance the overall reward (red curve). In contrast, by employing CL, we first restrict the simulation time of each episode to a shorter duration($\gamma_at\le 0.24$), encouraging the RL model to discover effective encodings exceeding the breakeven threshold (green curve). In the second phase, we gradually extend the maximum simulation duration; training continues as long as the average fidelity remains above the breakeven point. This strategy enables RL to explore increasingly stable encodings based on achievements in the first phase, sustaining fidelities above the breakeven level for longer periods (blue curve).
  • Figure 5: a The fidelity distribution $F(\theta,\phi,t)$ of other codes (T4C gertler_protecting_2021, Binomial hu_quantum_2019 and RL zeng_approximate_2023 code) and GRL code solved as the function of Bloch angle $\theta$ and $\phi$ analytically with step $\pi/10$ and $\pi/20$ respectively. The result is shown with $\gamma_at=0.6$ and $\lambda=10^4$. b Plot of average reward(solid blue line) as function of train episode. The evolution time for each step is $\gamma_a t=0.06$c Comparison of AQEC performance of: breakeven, T4C code, the lowest-order binomial code (Bin. code in figure), GRL code and RL code. Performance with same code are compared considering single-photon-loss(dash lines) and double-photon-loss(solid lines) to show the robustness versing higher-order-photon-loss.
  • ...and 7 more figures