Achieving fast and robust perfect entangling gates via reinforcement learning
Leander Grech, Matthias G. Krauss, Mirko Consiglio, Tony J. G. Apollaro, Christiane P. Koch, Simon Hirlaender, Gianluca Valentino
TL;DR
This work tackles the challenge of achieving fast and robust perfect entangling gates on noisy quantum devices by reframing quantum optimal control as a reinforcement learning problem. The authors introduce ZCQPEE, a parametrized RL environment, and train a Trust Region Policy Optimization agent to generate smooth pulses that realize PE gates in a three-qutrit setting mediated by a tunable coupler. They benchmark RL-generated pulses against gradient-based OCT baselines, demonstrating near time-optimal performance approaching the quantum speed limit (for example, a final pulse time of about $10\,\mathrm{ns}$ with $J_T\sim 10^{-4}$ under amplitudes of $\pm 1.5\,\mathrm{GHz}$) and uncover emergent robustness to parameter drift, including domain randomization. The results indicate RL can generalize across unseen Hamiltonian variations and reduce calibration overhead, pointing to broad hardware-agnostic applicability and a path toward experimental validation and extensions to open-system dynamics.
Abstract
Noisy intermediate-scale quantum computers hold the promise of tackling complex and otherwise intractable computational challenges through the massive parallelism offered by qubits. Central to realizing the potential of quantum computing are perfect entangling (PE) two-qubit gates, which serve as a critical building block for universal quantum computation. In the context of quantum optimal control, shaping electromagnetic pulses to drive quantum gates is crucial for pushing gate performance toward theoretical limits. In this work, we leverage reinforcement learning (RL) techniques to discover near-optimal pulse shapes that yield PE gates. A collection of RL agents is trained within robust simulation environments, enabling the identification of effective control strategies even under noisy conditions. Selected agents are then validated on higher-fidelity simulations, illustrating how RL-based methods can reduce calibration overhead when compared to quantum optimal control techniques. Furthermore, the RL approach is hardware agnostic with the potential for broad applicability across various quantum computing platforms.
