Table of Contents
Fetching ...

Scalable testing of quantum error correction

John Zhuoyang Ye, Jens Palsberg

TL;DR

This paper tackles the scalability problem in benchmarking quantum error correction by introducing ScaLER, which combines stratified fault injection with S-curve extrapolation to estimate logical-error rates beyond the reach of existing tools like Stim. By focusing testing on high-weight fault subspaces and fitting a predictive S-curve, ScaLER achieves accurate LER estimates at larger code distances (up to distance 17) within a practical 2-hour desktop budget, demonstrated on surface, toric, and QLDPC codes. The work provides a formal modeling framework for per-weight logical-error rates, a practical sweet-spot concept to balance accuracy and cost, and an end-to-end algorithm with open-source implementation. Overall, ScaLER offers a general and scalable pathway to benchmark high-quality QEC implementations, enabling more rapid assessment and comparison of fault-tolerant schemes under realistic noise models. The method's ability to extrapolate from high-weight data while maintaining high fidelity has potential to significantly accelerate the development and validation of fault-tolerant quantum architectures.

Abstract

The standard method for benchmarking quantum error-correction is randomized fault-injection testing. The state-of-the-art tool \stim is efficient for error correction implementations with distances of up to 10, but scales poorly to larger distances for low physical error rates. In this paper, we present a scalable approach that combines stratified fault injection with extrapolation. Our insight is that some of the fault space can be sampled efficiently, after which extrapolation is sufficient to complete the testing task. As a result, our tool scales to distance 17 for a physical error rate of 0.0005 with a two-hour time budget on a desktop. For this case, it estimated a logical error rate of $1.51 \times 10^{-11}$ with high confidence.

Scalable testing of quantum error correction

TL;DR

This paper tackles the scalability problem in benchmarking quantum error correction by introducing ScaLER, which combines stratified fault injection with S-curve extrapolation to estimate logical-error rates beyond the reach of existing tools like Stim. By focusing testing on high-weight fault subspaces and fitting a predictive S-curve, ScaLER achieves accurate LER estimates at larger code distances (up to distance 17) within a practical 2-hour desktop budget, demonstrated on surface, toric, and QLDPC codes. The work provides a formal modeling framework for per-weight logical-error rates, a practical sweet-spot concept to balance accuracy and cost, and an end-to-end algorithm with open-source implementation. Overall, ScaLER offers a general and scalable pathway to benchmark high-quality QEC implementations, enabling more rapid assessment and comparison of fault-tolerant schemes under realistic noise models. The method's ability to extrapolate from high-weight data while maintaining high fidelity has potential to significantly accelerate the development and validation of fault-tolerant quantum architectures.

Abstract

The standard method for benchmarking quantum error-correction is randomized fault-injection testing. The state-of-the-art tool \stim is efficient for error correction implementations with distances of up to 10, but scales poorly to larger distances for low physical error rates. In this paper, we present a scalable approach that combines stratified fault injection with extrapolation. Our insight is that some of the fault space can be sampled efficiently, after which extrapolation is sufficient to complete the testing task. As a result, our tool scales to distance 17 for a physical error rate of 0.0005 with a two-hour time budget on a desktop. For this case, it estimated a logical error rate of with high confidence.
Paper Structure (59 sections, 19 equations, 19 figures, 7 tables, 2 algorithms)

This paper contains 59 sections, 19 equations, 19 figures, 7 tables, 2 algorithms.

Figures (19)

  • Figure 1: Diagram for injecting SID error model to a $[[3,1,3]]$ repetition code circuit which protect logical $|0\rangle$ state. There are $11$ error locations indexed by $n_i$, $i \in [1,11]$. We inject fault $n_7=X$, which will trigger detector $D_1=1$ after propagation. We also inject fault $n_1=Y$, which trigger detector $D_0=1$ and also flip observable result $O_0=1$.
  • Figure 2: We run Stim on Surface code with distance $7$ and physical error rate $5\times 10^{-4}$ and plot the number of samples ( red bars) and the number of logical errors ( blue bars) for each weight. Below $w=3$, the circuit is fault-tolerant and Stim cannot sample any logical errors. Beverland et al. and Carolyn Mayer et al. report similar observations in their recent papers mayer2025rareeventsimulationquantummayer2025rareeventsimulationquantum.
  • Figure 3: S-curve data (Surface code $d{=}7$, $p=5\times 10^{-4}$) combining Stim samples and ScaLER extensions.
  • Figure 4: We fit the data collected by small scale QEC circuit (d=3,5,7) for Surface code, Toric code and Bivariate Bicycle code by both S-curve model under single qubit depolarization noise model and . Both models fits perfectly well with the observed data for all three codes. We calculate the $R^2$ score to evaluate the curve fitting. $R^2>0.99$ for all cases(Table \ref{['tab:Rsquare']}).
  • Figure 5: Y-curve transformation of (c) via $\ln\!(\frac{1}{2P_L^w}-1)$ (Eq. \ref{['eq:YcurveTrans']}); blue: Stim; red: ScaLER.
  • ...and 14 more figures

Theorems & Definitions (3)

  • Definition 1: Modeling S-Curves
  • Definition 2: IBM's S-Curve Model
  • Definition 3: Our S-Curve Model