Automatic re-calibration of quantum devices by reinforcement learning

T. Crosta; L. Rebón; F. Vilariño; J. M. Matera; M. Bilkis

Automatic re-calibration of quantum devices by reinforcement learning

T. Crosta, L. Rebón, F. Vilariño, J. M. Matera, M. Bilkis

TL;DR

Calibrating quantum devices under time-varying, hard-to-model environments is challenging due to incomplete models and costly parameter measurements. The authors propose a hybrid approach that combines an effective score-based initialization with model-free reinforcement learning, augmented by a de-calibration witness to detect deployment drift. They formalize a re-calibration framework and illustrate it with a Kennedy receiver in long-distance quantum communication, demonstrating automatic recalibration with reduced experimental overhead. The results suggest robust, automated recalibration that can be extended to other control problems in quantum technologies and beyond.

Abstract

During their operation, due to shifts in environmental conditions, devices undergo various forms of detuning from their optimal settings. Typically, this is addressed through control loops, which monitor variables and the device performance, to maintain settings at their optimal values. Quantum devices are particularly challenging since their functionality relies on precisely tuning their parameters. At the same time, the detailed modeling of the environmental behavior is often computationally unaffordable, while a direct measure of the parameters defining the system state is costly and introduces extra noise in the mechanism. In this study, we investigate the application of reinforcement learning techniques to develop a model-free control loop for continuous recalibration of quantum device parameters. Furthermore, we explore the advantages of incorporating minimal environmental noise models. As an example, the application to numerical simulations of a Kennedy receiver-based long-distance quantum communication protocol is presented.

Automatic re-calibration of quantum devices by reinforcement learning

TL;DR

Abstract

Paper Structure (6 sections, 1 equation, 6 figures, 1 table, 2 algorithms)

This paper contains 6 sections, 1 equation, 6 figures, 1 table, 2 algorithms.

Introduction
The re-calibration framework
Illustrative example and numerical development
Outlook & future research directions
Acknowledgments
Additional details in the RL implementation

Figures (6)

Figure 1: We depict a device that needs to be calibrated. Here, the apparatus is controlled by different knobs defined by values $\bm{\theta} = \lbrace \theta_1, ..., \theta_M\rbrace$, and the aim is to tune such parameters in a way that the device is configured to optimally operate under experimental conditions ${\cal E}$.
Figure 2: Single-parameter device example. Top panel: the optimal calibration score $S_{{\cal E}}(\bm{\theta})$ is shown (dashed-red vertical line), and its effective value $S_{\tilde{{\cal E}}}(\bm{\theta})$ (blue-dashed vertical line); while sub-optimal, this value is further fine-tuned by means of a model-free scheme (see main body). Bottom panel. We show score functions $S_{\mathcal{E}_0}$ and $S_{\mathcal{E}_1}$ before and after a change-point occurs in the environment. As a consequence, the device optimally configured under $\mathcal{E}_0$ needs now to be re-calibrated to the new optimal configuration for $\mathcal{E}_1$.
Figure 3: Diagram of a Kennedy receiver; this consists in applying a displacement $\theta$ to the incoming signal and measure it with an on/off photo-detector.
Figure 4: Recalibration and learning curve. (Top): Learning curve evolution. Average reward acquired during the last $10^3$ experiments $\langle Rew\rangle = \sum_{k=0}^{10^3}\frac{r_{t-k}}{10^3}$, and the evolution of the de-calibration witness $\mathcal{W}_{d}$ estimated by measurement statistics. As can be seen in the change-point, the Witness presents a big fluctuation, starting a recalibration of the system until the agent converges to the optimal reward. (Bottom): Update of the Q-values curve (left) and evolution of agent's greedy strategy, i.e. the configuration the calibrating agent would choose at each experiment (right).
Figure 5: We show the average internal strategy of $25$ calibrating agents to fine-tune the device configuration under a change-of-prior scenario. Specifically, we depict the $Q$-values (left panel) from the effective model, and the ones obtained after RL fine-tuning. In the right panel we show the evolution of agent’s greedy strategy.
...and 1 more figures

Automatic re-calibration of quantum devices by reinforcement learning

TL;DR

Abstract

Automatic re-calibration of quantum devices by reinforcement learning

Authors

TL;DR

Abstract

Table of Contents

Figures (6)