Table of Contents
Fetching ...

Achieving fast and robust perfect entangling gates via reinforcement learning

Leander Grech, Matthias G. Krauss, Mirko Consiglio, Tony J. G. Apollaro, Christiane P. Koch, Simon Hirlaender, Gianluca Valentino

TL;DR

This work tackles the challenge of achieving fast and robust perfect entangling gates on noisy quantum devices by reframing quantum optimal control as a reinforcement learning problem. The authors introduce ZCQPEE, a parametrized RL environment, and train a Trust Region Policy Optimization agent to generate smooth pulses that realize PE gates in a three-qutrit setting mediated by a tunable coupler. They benchmark RL-generated pulses against gradient-based OCT baselines, demonstrating near time-optimal performance approaching the quantum speed limit (for example, a final pulse time of about $10\,\mathrm{ns}$ with $J_T\sim 10^{-4}$ under amplitudes of $\pm 1.5\,\mathrm{GHz}$) and uncover emergent robustness to parameter drift, including domain randomization. The results indicate RL can generalize across unseen Hamiltonian variations and reduce calibration overhead, pointing to broad hardware-agnostic applicability and a path toward experimental validation and extensions to open-system dynamics.

Abstract

Noisy intermediate-scale quantum computers hold the promise of tackling complex and otherwise intractable computational challenges through the massive parallelism offered by qubits. Central to realizing the potential of quantum computing are perfect entangling (PE) two-qubit gates, which serve as a critical building block for universal quantum computation. In the context of quantum optimal control, shaping electromagnetic pulses to drive quantum gates is crucial for pushing gate performance toward theoretical limits. In this work, we leverage reinforcement learning (RL) techniques to discover near-optimal pulse shapes that yield PE gates. A collection of RL agents is trained within robust simulation environments, enabling the identification of effective control strategies even under noisy conditions. Selected agents are then validated on higher-fidelity simulations, illustrating how RL-based methods can reduce calibration overhead when compared to quantum optimal control techniques. Furthermore, the RL approach is hardware agnostic with the potential for broad applicability across various quantum computing platforms.

Achieving fast and robust perfect entangling gates via reinforcement learning

TL;DR

This work tackles the challenge of achieving fast and robust perfect entangling gates on noisy quantum devices by reframing quantum optimal control as a reinforcement learning problem. The authors introduce ZCQPEE, a parametrized RL environment, and train a Trust Region Policy Optimization agent to generate smooth pulses that realize PE gates in a three-qutrit setting mediated by a tunable coupler. They benchmark RL-generated pulses against gradient-based OCT baselines, demonstrating near time-optimal performance approaching the quantum speed limit (for example, a final pulse time of about with under amplitudes of ) and uncover emergent robustness to parameter drift, including domain randomization. The results indicate RL can generalize across unseen Hamiltonian variations and reduce calibration overhead, pointing to broad hardware-agnostic applicability and a path toward experimental validation and extensions to open-system dynamics.

Abstract

Noisy intermediate-scale quantum computers hold the promise of tackling complex and otherwise intractable computational challenges through the massive parallelism offered by qubits. Central to realizing the potential of quantum computing are perfect entangling (PE) two-qubit gates, which serve as a critical building block for universal quantum computation. In the context of quantum optimal control, shaping electromagnetic pulses to drive quantum gates is crucial for pushing gate performance toward theoretical limits. In this work, we leverage reinforcement learning (RL) techniques to discover near-optimal pulse shapes that yield PE gates. A collection of RL agents is trained within robust simulation environments, enabling the identification of effective control strategies even under noisy conditions. Selected agents are then validated on higher-fidelity simulations, illustrating how RL-based methods can reduce calibration overhead when compared to quantum optimal control techniques. Furthermore, the RL approach is hardware agnostic with the potential for broad applicability across various quantum computing platforms.

Paper Structure

This paper contains 26 sections, 10 equations, 20 figures, 2 tables.

Figures (20)

  • Figure 1: Diagram of the system consisting of two fixed-frequency qutrits, $Q_1$ and $Q_2$, and a tunable central bus qutrit, $Q_c$.
  • Figure 2: At each ZCQPEE step, the agent outputs a vector of pulse deltas $\Delta u(t)$ (black nodes), which are cumulatively summed, and applied in sequence over $K=3$ time steps. Observations $o_t$ are returned at a reduced frequency and reflect the quantum state evolution after each pulse segment. The generated pulse is collected on the terminal state, $o_T$, at time $t=T$.
  • Figure 3: Smallest achieved $J_T$ values as a function of the final pulse time $T$ for varying maximal pulse amplitudes. The QSL is identified as the shortest time where the curves deviate from the PE plateau at $J_T=10^{-4}$.
  • Figure 4: Optimized pulses generated using Krotov's method (blue/red). The two pulses were optimized with different guess pulses (cyan/orange). (a) Guess A: Good flat-top guess (cyan) and optimized (blue) pulse amplitudes over time. (b) Guess B: Bad single-frequency guess (orange) and optimized (red) pulse amplitudes over time. (c) & (d) show the FFT of the optimized and guess pulses, respectively.
  • Figure 5: Heatmap showing the spectral evolution of generated pulses throughout RL training (log scale). The $y$-axis represents frequency (1/ns), and the color indicates the pulse spectra (Fast Fourier Transform amplitude) at different training checkpoints, capturing the learned frequency components.
  • ...and 15 more figures