Quantum Circuit Discovery for Fault-Tolerant Logical State Preparation with Reinforcement Learning

Remmy Zen; Jan Olle; Luis Colmenarez; Matteo Puviani; Markus Müller; Florian Marquardt

Quantum Circuit Discovery for Fault-Tolerant Logical State Preparation with Reinforcement Learning

Remmy Zen, Jan Olle, Luis Colmenarez, Matteo Puviani, Markus Müller, Florian Marquardt

TL;DR

The paper tackles the challenge of fault-tolerant quantum circuit design under hardware constraints by employing reinforcement learning to discover flag-based FT circuits for quantum error correction. It develops an RL framework that uses stabilizer-tableau representations and discrete Clifford gates, optimizing via proximal policy optimization to produce both logical state preparation circuits and their verification circuits, with an integrated approach (IFT-LSP) showing superior performance. Key findings include RL-generated circuits with smaller gate counts and ancilla overhead across several codes, effective transfer learning to accelerate adaptation to different connectivity, and successful FT circuit synthesis under restricted connectivity such as 2D grids and heavy-hex layouts. The work demonstrates the viability of RL for scalable FT circuit discovery and paves the way for applications to magic state preparation, syndrome measurement, and logical-gate synthesis, potentially impacting the practical realization of large-scale quantum computers.

Abstract

The realization of large-scale quantum computers requires not only quantum error correction (QEC) but also fault-tolerant operations to handle errors that propagate into harmful errors. Recently, flag-based protocols have been introduced that use ancillary qubits to flag harmful errors. However, there is no clear recipe for finding a fault-tolerant quantum circuit with flag-based protocols, especially when we consider hardware constraints, such as qubit connectivity and available gate set. In this work, we propose and explore reinforcement learning (RL) to automatically discover compact and hardware-adapted fault-tolerant quantum circuits. We show that in the task of fault-tolerant logical state preparation, RL discovers circuits with fewer gates and ancillary qubits than published results without and with hardware constraints of up to 15 physical qubits. Furthermore, RL allows for straightforward exploration of different qubit connectivities and the use of transfer learning to accelerate the discovery. More generally, our work opens the door towards the use of RL for the discovery of fault-tolerant quantum circuits for addressing tasks beyond state preparation, including magic state preparation, logical gate synthesis, or syndrome measurement.

Quantum Circuit Discovery for Fault-Tolerant Logical State Preparation with Reinforcement Learning

TL;DR

Abstract

Paper Structure (41 sections, 8 equations, 20 figures, 2 tables)

This paper contains 41 sections, 8 equations, 20 figures, 2 tables.

Introduction
Background
Quantum Error Correction
Logical Qubit Encoding Circuit
Fault-Tolerant State Preparation
Reinforcement Learning
Reinforcement Learning Framework for Quantum Circuit Discovery
Logical State Preparation
Task Description and Reward Function
Results
Verification Circuit Synthesis
Task Description and Reward Function
Results
Integrated Fault-Tolerant Logical State Preparation
Task Description and Reward Function
...and 26 more sections

Figures (20)

Figure 1: Discovery of fault-tolerant logical state preparation circuit with reinforcement learning (RL). Given the target logical state $|\psi\rangle_L$ of a specified $[[n,k,d]]$ code, a gate set, and a qubit connectivity, we use RL to automatically discover circuits for preparing $|\psi\rangle_L$ fault-tolerantly with flag qubits.
Figure 2: The general RL framework in this work. The circuit is the environment, where its state is represented by its stabilizer canonical tableau. At each step, the RL agent observes the environment and applies a discrete Clifford gate as an action from the specified available gate set (e.g., the Hadamard gate $H$, the phase gate $S$, and the CNOT gate), taking into account qubit connectivity constraints. Subsequently, the agent receives a reward depending on the given task and the quality of the proposed circuit.
Figure 3: Description and reward function for the logical state preparation task. (a) The logical state preparation task outputs a circuit $U$ that prepares a target logical state $|\psi\rangle_L$ of a $[[n,k,d]]$ code. (b) The preparation of the state $| \psi_{\rm{target}}\rangle = |000\rangle + |111\rangle$ (normalization factors are not shown for simplicity) from the initial state $| \psi_{0}\rangle = |000\rangle$. We show the value of the three possible functions at each time step $t$ for the reward: fidelity $|\langle \psi_{t} | \psi_{\rm{target}}\rangle|^2$, energy $\sum_i \langle \psi_{t} | H | \psi_{t} \rangle$ used in xu2021variational, and our proposed complementary tableau distance $1 - d_t$. In this case, the proposed complementary tableau distance is monotonically increasing, which is easier for RL algorithms to learn compared to the other functions.
Figure 4: Results for the logical state preparation task. (a) The minimum circuit size of different methods for logical state preparation of different QEC codes with all-to-all qubit connectivity and $H$, $S$, and $\text{CNOT}$ gates. StabGraph amaro2020scalable does not work for non-CSS codes such as the $[[5,1,3]]$ perfect code. QMAP schneider2023sat could not prepare the state of the $[[15,1,3]]$ and the $[[17,1,5]]$ code in the allotted maximum time of $12$ hours. The inset shows an example of the training progress for preparing the $|0\rangle_L$ state of the $[[7,1,3]]$ Steane code. (b) Comparison of circuit size from an RL agent that includes the connectivity and gate set during training (RL Direct) with respect to RL-prepared circuits for all-to-all qubit connectivity that have been transpiled with Qiskitqiskittranspiler (RL + Transpile). Results shown for various IBM Quantum device connectivities manilajakartaguadalupetokyo using $\text{CNOT}$, $\sqrt{X}$, $X$, and $S = R_z(\pi / 2)$ gates. The inset shows examples of RL-prepared circuits for the $|0\rangle_L$ state of the $[[5,1,3]]$ perfect and the $[[7,1,3]]$ Steane code.
Figure 5: The verification circuit synthesis task prepares a circuit $V$ that uses flag qubits to flag harmful errors, thereby rendering a state preparation fault-tolerant.
...and 15 more figures

Quantum Circuit Discovery for Fault-Tolerant Logical State Preparation with Reinforcement Learning

TL;DR

Abstract

Quantum Circuit Discovery for Fault-Tolerant Logical State Preparation with Reinforcement Learning

Authors

TL;DR

Abstract

Table of Contents

Figures (20)