Table of Contents
Fetching ...

Reinforcement Learning Control of Quantum Error Correction

Volodymyr Sivak, Alexis Morvan, Michael Broughton, Matthew Neeley, Alec Eickbusch, Dmitry Abanin, Amira Abbas, Rajeev Acharya, Laleh Aghababaie Beni, Georg Aigeldinger, Ross Alcaraz, Sayra Alcaraz, Trond I. Andersen, Markus Ansmann, Frank Arute, Kunal Arya, Walt Askew, Nikita Astrakhantsev, Juan Atalaya, Brian Ballard, Joseph C. Bardin, Hector Bates, Andreas Bengtsson, Majid Bigdeli Karimi, Alexander Bilmes, Simon Bilodeau, Felix Borjans, Alexandre Bourassa, Jenna Bovaird, Dylan Bowers, Leon Brill, Peter Brooks, David A. Browne, Brett Buchea, Bob B. Buckley, Tim Burger, Brian Burkett, Nicholas Bushnell, Jamal Busnaina, Anthony Cabrera, Juan Campero, Hung-Shen Chang, Silas Chen, Ben Chiaro, Liang-Ying Chih, Agnetta Y. Cleland, Bryan Cochrane, Matt Cockrell, Josh Cogan, Roberto Collins, Paul Conner, Harold Cook, Rodrigo G. Cortiñas, William Courtney, Alexander L. Crook, Ben Curtin, Martin Damyanov, Sayan Das, Dripto M. Debroy, Sean Demura, Paul Donohoe, Ilya Drozdov, Andrew Dunsworth, Valerie Ehimhen, Aviv Moshe Elbag, Lior Ella, Mahmoud Elzouka, David Enriquez, Catherine Erickson, Vinicius S. Ferreira, Marcos Flores, Leslie Flores Burgos, Ebrahim Forati, Jeremiah Ford, Austin G. Fowler, Brooks Foxen, Masaya Fukami, Alan Wing Lun Fung, Lenny Fuste, Suhas Ganjam, Gonzalo Garcia, Christopher Garrick, Robert Gasca, Helge Gehring, Robert Geiger, Élie Genois, William Giang, Dar Gilboa, James E. Goeders, Edward C. Gonzales, Raja Gosula, Stijn J. de Graaf, Alejandro Grajales Dau, Dietrich Graumann, Joel Grebel, Alex Greene, Jonathan A. Gross, Jose Guerrero, Loïck Le Guevel, Tan Ha, Steve Habegger, Tanner Hadick, Ali Hadjikhani, Michael C. Hamilton, Matthew P. Harrigan, Sean D. Harrington, Jeanne Hartshorn, Stephen Heslin, Paula Heu, Oscar Higgott, Reno Hiltermann, Hsin-Yuan Huang, Mike Hucka, Christopher Hudspeth, Ashley Huff, William J. Huggins, Evan Jeffrey, Shaun Jevons, Zhang Jiang, Xiaoxuan Jin, Chaitali Joshi, Pavol Juhas, Andreas Kabel, Dvir Kafri, Hui Kang, Kiseo Kang, Amir H. Karamlou, Ryan Kaufman, Kostyantyn Kechedzhi, Tanuj Khattar, Mostafa Khezri, Seon Kim, Can M. Knaut, Bryce Kobrin, Fedor Kostritsa, John Mark Kreikebaum, Ryuho Kudo, Ben Kueffler, Arun Kumar, Vladislav D. Kurilovich, Vitali Kutsko, Nathan Lacroix, David Landhuis, Tiano Lange-Dei, Brandon W. Langley, Pavel Laptev, Kim-Ming Lau, Justin Ledford, Joy Lee, Kenny Lee, Brian J. Lester, Wendy Leung, Lily Li, Wing Yan Li, Ming Li, Alexander T. Lill, William P. Livingston, Matthew T. Lloyd, Aditya Locharla, Laura De Lorenzo, Daniel Lundahl, Aaron Lunt, Sid Madhuk, Aniket Maiti, Ashley Maloney, Salvatore Mandrà, Leigh S. Martin, Orion Martin, Eric Mascot, Paul Masih Das, Dmitri Maslov, Melvin Mathews, Cameron Maxfield, Jarrod R. McClean, Matt McEwen, Seneca Meeks, Kevin C. Miao, Zlatko K. Minev, Reza Molavi, Sebastian Molina, Shirin Montazeri, Charles Neill, Michael Newman, Anthony Nguyen, Murray Nguyen, Chia-Hung Ni, Murphy Yuezhen Niu, Logan Oas, Raymond Orosco, Kristoffer Ottosson, Alice Pagano, Agustin Di Paolo, Sherman Peek, David Peterson, Alex Pizzuto, Elias Portoles, Rebecca Potter, Orion Pritchard, Michael Qian, Chris Quintana, Arpit Ranadive, Matthew J. Reagor, Rachel Resnick, David M. Rhodes, Daniel Riley, Gabrielle Roberts, Roberto Rodriguez, Emma Ropes, Lucia B. De Rose, Eliott Rosenberg, Emma Rosenfeld, Dario Rosenstock, Elizabeth Rossi, Pedram Roushan, David A. Rower, Robert Salazar, Kannan Sankaragomathi, Murat Can Sarihan, Kevin J. Satzinger, Max Schaefer, Sebastian Schroeder, Henry F. Schurkus, Aria Shahingohar, Michael J. Shearn, Aaron Shorter, Noah Shutty, Vladimir Shvarts, Spencer Small, W. Clarke Smith, David A. Sobel, Barrett Spells, Sofia Springer, George Sterling, Jordan Suchard, Aaron Szasz, Alexander Sztein, Madeline Taylor, Jothi Priyanka Thiruraman, Douglas Thor, Dogan Timucin, Eifu Tomita, Alfredo Torres, M. Mert Torunbalci, Hao Tran, Abeer Vaishnav, Justin Vargas, Sergey Vdovichev, Guifre Vidal, Catherine Vollgraff Heidweiller, Meghan Voorhees, Steven Waltman, Jonathan Waltz, Shannon X. Wang, Brayden Ware, James D. Watson, Yonghua Wei, Travis Weidel, Theodore White, Kristi Wong, Bryan W. K. Woo, Christopher J. Wood, Maddy Woodson, Cheng Xing, Z. Jamie Yao, Ping Yeh, Bicheng Ying, Juhwan Yoo, Noureldin Yosri, Elliot Young, Grayson Young, Adam Zalcman, Ran Zhang, Yaxing Zhang, Ningfeng Zhu, Nicholas Zobrist, Zhenjie Zou, Ryan Babbush, Dave Bacon, Sergio Boixo, Yu Chen, Zijun Chen, Michel Devoret, Monica Hansen, Jeremy Hilton, Cody Jones, Julian Kelly, Alexander N. Korotkov, Erik Lucero, Anthony Megrant, Hartmut Neven, William D. Oliver, Ganesh Ramachandran, Vadim Smelyanskiy, Paul V. Klimov

TL;DR

This work unifies calibration with computation, granting the quantum error correction process a dual role: its error detection events are not only used to correct the logical quantum state, but are also repurposed as a learning signal, teaching a reinforcement learning agent to continuously steer the physical control parameters and stabilize the quantum system during the computation.

Abstract

The promise of fault-tolerant quantum computing is challenged by environmental drift that relentlessly degrades the quality of quantum operations. The contemporary solution, halting the entire quantum computation for recalibration, is unsustainable for the long runtimes of the future algorithms. We address this challenge by unifying calibration with computation, granting the quantum error correction process a dual role: its error detection events are not only used to correct the logical quantum state, but are also repurposed as a learning signal, teaching a reinforcement learning agent to continuously steer the physical control parameters and stabilize the quantum system during the computation. We experimentally demonstrate this framework on a superconducting processor, improving the logical error rate stability of the surface code 3.5-fold against injected drift and pushing the performance beyond what is achievable with state-of-the-art traditional calibration and human-expert tuning. Simulations of surface codes up to distance-15 confirm the scalability of our method, revealing an optimization speed that is independent of the system size. This work thus enables a new paradigm: a quantum computer that learns to self-improve directly from its errors and never stops computing.

Reinforcement Learning Control of Quantum Error Correction

TL;DR

This work unifies calibration with computation, granting the quantum error correction process a dual role: its error detection events are not only used to correct the logical quantum state, but are also repurposed as a learning signal, teaching a reinforcement learning agent to continuously steer the physical control parameters and stabilize the quantum system during the computation.

Abstract

The promise of fault-tolerant quantum computing is challenged by environmental drift that relentlessly degrades the quality of quantum operations. The contemporary solution, halting the entire quantum computation for recalibration, is unsustainable for the long runtimes of the future algorithms. We address this challenge by unifying calibration with computation, granting the quantum error correction process a dual role: its error detection events are not only used to correct the logical quantum state, but are also repurposed as a learning signal, teaching a reinforcement learning agent to continuously steer the physical control parameters and stabilize the quantum system during the computation. We experimentally demonstrate this framework on a superconducting processor, improving the logical error rate stability of the surface code 3.5-fold against injected drift and pushing the performance beyond what is achievable with state-of-the-art traditional calibration and human-expert tuning. Simulations of surface codes up to distance-15 confirm the scalability of our method, revealing an optimization speed that is independent of the system size. This work thus enables a new paradigm: a quantum computer that learns to self-improve directly from its errors and never stops computing.

Paper Structure

This paper contains 8 sections, 5 figures.

Figures (5)

  • Figure 1: Overview of RL control.(a) Hierarchy of the feedback loops in control of an error-corrected quantum system. The low-level loop with analog control and readout signals (purple) occurs on a time scale of one QEC cycle; the logical algorithm's digital feedback loop (green) occurs on a time scale of the decoding latency; the learning feedback loop (pink), presented in this work, is not synchronized with the lower levels, and occurs on a time scale determined by the relevant system drift. The indicated time scales are characteristic of superconducting circuits quantum computing platform acharya2024quantum. (b) A small space-time chunk of the QEC circuit for the repetition code, highlighting two overlapping detecting regions. (c) One iteration of the learning process. In each epoch, a batch of control policy candidates is sampled from the policy distribution. A certain number of QEC cycles is executed with each policy candidate (shades of red and blue). The acquired QEC data is used to compute rewards by estimating error detection rates for each detector. This information, indicating the relative performance of each policy candidate, is converted by the learning algorithm into a small gradient step of the policy distribution. Then, a new batch of policy candidates is sampled and the process repeats.
  • Figure 2: Optimization with surrogate objective.(a) While LER is the principal measure of quality of the QEC process, it is impractical to use during optimization, see main text. This motivates adoption of a surrogate objective $C$. (b) Finite-difference partial derivatives experimentally evaluated along random directions in the control parameter space confirm the linear relation between the gradient of the true and surrogate objectives, with proportionality coefficient $(d+1)/2$ (black line). (c) The surrogate objective allows us to effectively utilize the sparse dependence of error detection rates on the system control parameters, represented here as a factor graph. The detector nodes are connected to the learnable control parameters of the gates within their respective detecting regions. In our distance-5 surface code experiment, on average each detector node is connected to $302$ parameter nodes, and each parameter node to $18$ detector nodes.
  • Figure 3: RL fine-tuning of QEC performance.(a) Systematic improvement of LER from RL fine-tuning applied after exhaustive conventional calibration process, with five independent runs for surface code and color code each (grey), mean performance (teal) and one-sigma deviation (shaded region). (b) Decay of the logical observable in a quantum memory experiment, averaged over $X$ and $Z$ bases, for surface code and color code. The reference curves (grey) use QEC syndrome data from Refs. acharya2024quantumlacroix2025scaling, here reprocessed with Tesseract decoder for consistent comparison, see main text. For Ref. acharya2024quantum, we selected the best-performing among the distance-5 codes. Note, better results were achieved in Refs. acharya2024quantumlacroix2025scaling using a more accurate neural network decoder, see Supp. Mat. VI.
  • Figure 4: Demonstration of RL steering.(a) The data qubits (gold diamonds) and measure qubits (panels with data) are arranged in the layout of a distance-5 surface code. We inject artificial drift on the gates indicated with red shapes (circle, diamond, bars) and observe elevated error detection signals where expected (colored background). The detection rate associated with each measure qubit is normalized for visualization to remove the effect of the natural system drift. Performance of the fixed control policy (maroon) degrades over time due to the injected drift, while RL steering (blue) stabilizes and maintains the DR below its initial level (white lines). (b) Time-dependence of the injected drift in system control parameters (dashed) and RL steering (solid). Exponential fit of RL response to step-like drift in XY pulse amplitude yields the characteristic learning time of $130$ epochs. (c) Periodic evaluation of the logical performance indicates that RL steering of the system significantly suppresses and stabilizes the LER, see main text. Additionally incorporating the decoder steering (black) further improves these results.
  • Figure 5: Real-time steering and scaling simulations.(a) Normalized improvement (color) of detection rate in the real-time steering simulation of the distance-3 surface code subject to sinusoidal drift at different frequencies. Level $1$ indicates the performance of optimal policy. Isoline at level $0$ demarcates the boundary beyond which real-time steering results in better performance than a fixed policy, approaching the performance of optimal policy in the regime of slow drift. (b) Simulation of scalability of RL control of large surface codes. The algorithm learns the parameters of single-qubit and CZ gates, with $30$ control parameters per gate, amounting to almost $40,000$ control parameters in total for the distance-15 code. During the learning process, the logical error rate reduces over time (color) until it reaches the floor (red bars) set by the irreducible physical error rates and characterized by the error suppression factor $\Lambda^*$. (c) Point-estimates of $\Lambda$ at every code distance and learning epoch from (b) confirm that the speed, $\partial_t \Lambda / \Lambda^* \times 10^2$, at which the error suppression factor approaches the local optimum, is proportional to the distance from the optimum, $1-\Lambda/\Lambda^*$. The convergence rate $\gamma$ is independent of the system size but depends on the number of control parameters per gate, with three beams corresponding to 1, 10, and 30 parameters, and the linear fits (red) indicating the convergence rates.