Millisecond-Scale Calibration and Benchmarking of Superconducting Qubits

Malthe A. Marciniak; Rune T. Birke; Johann B. Severin; Fabrizio Berritta; Daniel Kjær; Filip Nilsson; Smitha N. Themadath; Sangeeth Kallatt; James L. Webb; Kristoffer Bentsen; Tonny Madsen; Zhenhai Sun; Svend Krøjer; Christopher W. Warren; Jacob Hastrup; Morten Kjaergaard

Millisecond-Scale Calibration and Benchmarking of Superconducting Qubits

Malthe A. Marciniak, Rune T. Birke, Johann B. Severin, Fabrizio Berritta, Daniel Kjær, Filip Nilsson, Smitha N. Themadath, Sangeeth Kallatt, James L. Webb, Kristoffer Bentsen, Tonny Madsen, Zhenhai Sun, Svend Krøjer, Christopher W. Warren, Jacob Hastrup, Morten Kjaergaard

TL;DR

The paper addresses millisecond-scale calibration challenges in drifting superconducting qubits by developing an on-FPGA calibration and analysis workflow that co-locates pulse generation, data acquisition, analysis, and feed-forward. It introduces Analytical Decay Estimation (ADE) for rapid exponential-decay inference and Sparse Phase Estimation (SPE) for robust sine-like responses, supplemented by FPGA implementations of Nelder-Mead optimization and golden-section search. The approach yields fast primitives: T1 estimation in ~10 ms, readout optimization in ~100 ms, π-train amplitude corrections in ~1 ms, and Clifford randomized benchmarking in ~107 ms, validated through a 6-hour continuous closed-loop recalibration that reduces average gate infidelity by ~6.4% and tracks drift in coherence and control parameters. The results demonstrate substantial gains in calibration speed and resilience to environmental drift, highlighting the practicality of millisecond-timescale autonomous calibration for scalable quantum processors and guiding future multi-qubit extensions.

Abstract

Superconducting qubit parameters drift on sub-second timescales, motivating calibration and benchmarking techniques that can be executed on millisecond timescales. We demonstrate an on-FPGA workflow that co-locates pulse generation, data acquisition, analysis, and feed-forward, eliminating CPU round trips. Within this workflow, we introduce sparse-sampling and on-FPGA inference tools, including computationally efficient methods for estimation of exponential and sine-like response functions, as well as on-FPGA implementations of Nelder-Mead optimization and golden-section search. These methods enable low-latency primitives for readout calibration, spectroscopy, pulse-amplitude calibration, coherence estimation, and benchmarking. We deploy this toolset to estimate $T_1$ in 10 ms, optimize readout parameters in 100 ms, optimize pulse amplitudes in 1 ms, and perform Clifford randomized gate benchmarking in 107 ms on a flux-tunable superconducting transmon qubit. Running a closed-loop on-FPGA recalibration protocol continuously for 6 hours enables more than 74,000 consecutive recalibrations and yields gate errors that consistently retain better performance than the baseline initial calibration. Correlation analysis shows that recalibration suppresses coupling of gate error to control-parameter drift while preserving a coherence-linked performance. Finally, we quantify uncertainty versus time-to-decision under our sparse sampling approaches and identify optimal parameter regimes for efficient estimation of qubit and pulse parameters.

Millisecond-Scale Calibration and Benchmarking of Superconducting Qubits

TL;DR

Abstract

in 10 ms, optimize readout parameters in 100 ms, optimize pulse amplitudes in 1 ms, and perform Clifford randomized gate benchmarking in 107 ms on a flux-tunable superconducting transmon qubit. Running a closed-loop on-FPGA recalibration protocol continuously for 6 hours enables more than 74,000 consecutive recalibrations and yields gate errors that consistently retain better performance than the baseline initial calibration. Correlation analysis shows that recalibration suppresses coupling of gate error to control-parameter drift while preserving a coherence-linked performance. Finally, we quantify uncertainty versus time-to-decision under our sparse sampling approaches and identify optimal parameter regimes for efficient estimation of qubit and pulse parameters.

Paper Structure (20 sections, 22 equations, 7 figures)

This paper contains 20 sections, 22 equations, 7 figures.

Sub-second coherence dynamics
Rapid FPGA-based Calibration, Analysis and Optimization Toolkit
Scope and design principles
Qubit coherence tracking with analytical decay estimation
Readout optimization via two-dimensional minimizer
Spectroscopic peak-finding via golden-section search
Pulse-amplitude correction and frequency correction via sparse phase estimation
Analytical decay estimation applied to Clifford randomized benchmarking
Low-latency primitives as building blocks
Continuous closed-loop on-FPGA recalibration and benchmarking
Timing and Round-trip Latency Limits
Uncertainty Scaling Under Sparse Sampling
Conclusion
Experimental apparatus and fabrication
Downsampling and moving average
...and 5 more sections

Figures (7)

Figure 1: Rapid parameter estimation on a superconducting quantum processor. (a) Tilted SEM micrograph of the 12-qubit flux-tunable transmon qubit processor. The specific qubit used in this work is highlighted in a white dashed box. (b) Continuous $T_1$ tracking using our on-FPGA control, with roughly 200 $T_1$ estimates performed in 2 seconds of wall-clock time. (c) A $T_1$ measurement (red points) measured with our on-FPGA loop performing both measurement and estimation within a closed-loop taking 9.8 ms of wallclock time. (d) A $T_1$ measurement of the qubit performed with a conventional offloading loop, taking roughly $\sim 250\,\mathrm{ms}$ of wall-clock time, and dominated by data transfer, dense sampling and software overhead.
Figure 2: Offloading versus on-FPGA calibration primitives. (a--c) Readout optimization. (a) Readout-optimization primitive and time-to-decision per iteration. (b) SNR landscape in $(\Delta f_{\mathrm{RO}},A_{\mathrm{RO}})$ with a representative Nelder--Mead trajectory (red) over a conventional sweep (green heatmap). Gray lines show the Nealder--Mead simplexes. (c) Example single-shot IQ clusters used to evaluate the SNR objective. (d--f) Drive-frequency refinement by spectroscopy. (d) Spectroscopy primitive and per-iteration time-to-decision. (e) Golden-section-search iteration trace to maximize $P_{\ket{1}}$ over $\Delta f_{01}$. (f) Spectroscopy data comparing a dense grid sampled with offloading strategy (grey) to golden-section search executed on-FPGA (red). (g--i) $\pi$-pulse amplitude correction. (g) $\pi$-train primitive and time-to-decision. (h) Conventional amplitude scan (blue heatmap) with the three-point estimator sampling points highlighted. (i) Three-point amplitude update using $n_\pi=21$ (red) over a line cut of (h) (grey). (j--l) Clifford randomized benchmarking. (j) CRB primitive and time-to-decision. (k) Example traditional offloading-workflow CRB analysis (dense sampling and fit functions executed offline). (l) On-FPGA CRB analysis using analytical decay estimation from three sequence lengths (red) overlaid on CRB data from offloading workflow (grey), see text for details.
Figure 3: Long-term closed-loop on-FPGA recalibration and drift-channel analysis. (a) Closed-loop protocol with alternating calibration and CRB validation, repeated with a $290\,\mathrm{ms}$ cadence and $\sim31\,\mathrm{ms}$ in-loop calibration latency. (b) Gate error over 6 hours for a static baseline, $\epsilon_g^{A}$ (grey), and continuous recalibration, $\epsilon_g^{B}$ (red), showing reduced error under closed-loop operation. For visual clarity, a rolling average of 200 is applied to the data. (c) 30 second zoom-in of (b), with a rolling average of 5 applied to the data. (d--f) Concurrently tracked drift channels: $\Gamma_1=1/T_1$, inferred detuning $\Delta f_d$, and the relative $\pi$ and $\pi/2$ pulse amplitudes. A rolling average of 200 consecutive experiments have been applied for visual clarity. For all three parameters, we plot Allan deviation ($\mathcal{A}$) in the rightmost inset, along with fitted white noise ($W$), Lorentzian ($L$) and $1/f$ noise contributions ($1/f$). For $\Gamma_1$ we also plot the Allan deviation with white noise contribution subtracted, $\mathcal{A}-W$ (orange curve). (g) Pearson correlations between gate error and drift channels at rolling average of 200 points (corresponding to $\tau \approx 60\,\textrm{s}$), comparing $C(\epsilon_g^{A},x)$ and $C(\epsilon_g^{B},x)$ for the four parameters tracked in this experiment. (h) Timescale-resolved correlation difference $\Delta C(x;\tau) = C(\epsilon^B,x;\tau)-C(\epsilon^A,x;\tau)$ computed after smoothing with window $\tau$, shown alongside the Allan deviation of $\Gamma_1$ (orange, right axis), $\Delta f_\textrm{d}$ (dark blue), $A_\pi$ (blue), and $A_{\pi/2}$ (light blue). The dashed grey line marks the averaging time $\tau$ at which the correlations reported in panel (g) are evaluated.
Figure 4: Timing budget and latency limits. (a) Measured execution times $T_{\!\text{ }}$ for FPGA routines compared with representative WiFi/LAN round-trip delays, highlighting the communication bottleneck for offloading-based analysis. Here, $f_\mathrm{res}$ is a resonance-finding spectroscopy primitive using golden-section search, analogous to the $f_{01}$ peak-finding procedure in Fig. 2(d); $f_\mathrm{RO}$ is the readout-frequency calibration shown in Fig. 2(a); and IQ denotes a single IQ-discrimination update iteration, analogous to Fig. 2(c). $T_1$ and CRB are extracted using the same ADE-style procedure. Error bars indicate $1\sigma$ uncertainty estimated via bootstrapping over repeated measurements ($3000$ repeats for the experiment timings and $1000$ repeats for the network round-trip measurements). (b) Timing breakdown example for some of the on-FPGA calibration primitives, showing that the wall-time is dominated by qubit reset and acquisition rather than classical analysis (relative time not visible at this scale). (c) Timing breakdown example of representative instantiation of the traditional offloading procedure, which is dominated by network communication.
Figure 5: Uncertainty scaling under sparse sampling. (a) Analysis of the $\pi$-train amplitude-correction estimator as a function of total measurement time (or equivalently the number of pulses). Uncertainty is estimated by bootstrapping and decreases with increased number of pulses, $n$, following a $1/n$ Heisenberg-like scaling. At high $n$, the analysis breaks down because the length of $\pi$-pulses approaches $T_1$. (b) Scaling for sparse $T_1$ estimation as a function of wait time scale factor ($\alpha$). For small-to-moderate $\alpha$, the statistical uncertainty (from bootstrapping) follows a $1/\sqrt{\alpha}$ trend, but exhibit systematic deviations for long wait time scale factors (see text for details).
...and 2 more figures

Millisecond-Scale Calibration and Benchmarking of Superconducting Qubits

TL;DR

Abstract

Millisecond-Scale Calibration and Benchmarking of Superconducting Qubits

Authors

TL;DR

Abstract

Table of Contents

Figures (7)