Averting multi-qubit burst errors in surface code magic state factories

Jason D. Chadwick; Christopher Kang; Joshua Viszlai; Sophia Fuhui Lin; Frederic T. Chong

Averting multi-qubit burst errors in surface code magic state factories

Jason D. Chadwick, Christopher Kang, Joshua Viszlai, Sophia Fuhui Lin, Frederic T. Chong

TL;DR

The paper tackles burst-error challenges from cosmic rays and TLS scrambling in superconducting quantum hardware, focusing on magic-state factories which dominate spacetime costs. It introduces a low-overhead software strategy: detect bursts via windowed stabilizer syndrome counts, offload defective regions, and remap the five-qubit magic-state factory to a new footprint, aided by a buffered pool of distilled $T$ states to cover detection latency. Compared to Code expansion and Distributed baselines, the remapping approach achieves orders-of-magnitude reductions in qubitcycle overhead under ideal detection (geometric means up to ~13.9×), with robustness to nonzero detection latency, though performance degrades when detection is slow, especially for TLS scrambling. This method reduces hardware burden for magic-state distillation and can generalize to other transient error sources, representing a practical software-level mitigation for near-term fault-tolerant quantum hardware.

Abstract

Fault-tolerant quantum computation relies on the assumption of time-invariant, sufficiently low physical error rates. However, current superconducting quantum computers suffer from frequent disruptive noise events, including cosmic ray impacts and shifting two-level system defects. Several methods have been proposed to mitigate these issues in software, but they add large overheads in terms of physical qubit count, as it is difficult to preserve logical information through burst error events. We focus on mitigating multi-qubit burst errors in magic state factories, which are expected to comprise up to 95% of the space cost of future quantum programs. Our key insight is that magic state factories do not need to preserve logical information over time; once we detect an increase in local physical error rates, we can simply turn off parts of the factory that are affected, re-map the factory to the new chip geometry, and continue operating. This is much more efficient than previous more general methods, and is resilient even under many simultaneous impact events. Using precise physical noise models, we show an efficient ray detection method and evaluate our strategy in different noise regimes. Compared to existing baselines, we find reductions in ray-induced overheads by several orders of magnitude, reducing total qubitcycle cost by geomean 6.5x to 13.9x depending on the noise model. This work reduces the burden on hardware by providing low-overhead software mitigation of these errors.

Averting multi-qubit burst errors in surface code magic state factories

TL;DR

states to cover detection latency. Compared to Code expansion and Distributed baselines, the remapping approach achieves orders-of-magnitude reductions in qubitcycle overhead under ideal detection (geometric means up to ~13.9×), with robustness to nonzero detection latency, though performance degrades when detection is slow, especially for TLS scrambling. This method reduces hardware burden for magic-state distillation and can generalize to other transient error sources, representing a practical software-level mitigation for near-term fault-tolerant quantum hardware.

Abstract

Paper Structure (27 sections, 12 figures)

This paper contains 27 sections, 12 figures.

Introduction
Background
Two-level systems and cosmic rays
Error correction and the surface code
Magic state distillation in the surface code
Noise models
Physical qubit noise model
Modeling cosmic rays
Direct interaction models
TLS scrambling
Detecting burst error events
Burst error detection via spatial windowing
Detector performance
Re-mapping magic state factories
Distilled T buffer
...and 12 more sections

Figures (12)

Figure 1: Overview of our method, showing the timeline of a magic state factory before and after a cosmic ray event. (a) The factory is operating normally. (b) A cosmic ray hits the chip, severely affecting physical qubit error rates nearby and causing the factory to output low-quality magic states. (c) The cosmic ray is detected after some delay. The distilled magic states in the buffer (yellow) are discarded, the affected physical qubits are turned offline, and the factory is re-mapped to avoid using the offline areas. The factory can continue to operate at reduced speed. (d) The affected physical qubits recover from the ray impact and are turned back online. The factory resumes normal operation.
Figure 2: Background on magic state distillation. (a) A high-fidelity magic state can be consumed in a fault-tolerant circuit to perform a logical T gate. (b) A surface code patches is a rectangular grid of physical qubits that can act as one logical qubit. (c) The magic state factory layout from litinski_magic_2019, made up of five logical qubits (dark blue) and two routing space channels (light blue). (d) One step of a distillation. In this step, three faulty T rotations are applied to the qubits. 15-to-1 distillation uses 15 faulty T states in 6 rounds of operations, and outputs one higher-fidelity T state at the end.
Figure 3: The two cosmic ray noise models studied in this work. From left to right, we show the effect of a representative ray on the physical qubit $T_1$ times, the decoherence error rate over $3\mu$s (the time period used in mcewen_resolving_2022), and the chance that each surface code stabilizer measurement will detect an error. Top: The Direct model is based on mcewen_resolving_2022. A ray impact directly affects the $T_1$ times of qubits within some radius, becoming less severe with distance. Bottom: The Scrambling model is based on wilen_correlated_2021thorbeck_two-level-system_2023. A ray impact scrambles TLS defects in some area, leading to unpredictable and long-lasting effects on qubit coherence.
Figure 4: We detect cosmic rays by counting stabilizer syndromes in spatiotemporal windows. Left: Each stabilizer produces a syndrome when its measured parity differs from the expected value. We can count the number of syndromes in a $w_s \times w_s \times w_t$ window, and trigger a ray detection event if this count exceeds some threshold. Right: When a detection event is triggered, we turn off all physical qubits within the offline flag radius $r_{\text{off}}$.
Figure 5: The number of distillations required after a ray impact to flag every physical qubit within Direct ray radius with high probability. Detection becomes signficantly harder as radius and ray severity both decrease. Annotation marks the ray impact rate and approximate radius observed by mcewen_resolving_2022.
...and 7 more figures

Averting multi-qubit burst errors in surface code magic state factories

TL;DR

Abstract

Averting multi-qubit burst errors in surface code magic state factories

Authors

TL;DR

Abstract

Table of Contents

Figures (12)