Challenges for Reinforcement Learning in Quantum Circuit Design
Philipp Altmann, Jonas Stein, Michael Kölle, Adelina Bärligea, Thomas Gabor, Thomy Phan, Sebastian Feld, Claudia Linnhoff-Popien
TL;DR
The paper addresses automated design of quantum circuits for NISQ hardware using reinforcement learning, introducing qcd-gym to learn policies over a universal gate set under hardware constraints. Key objective measures include state-preparation fidelity $F = | abla angle$; specifically, $F = | angle$ and the unitary-composition distance $D(oldsymbol{U}, oldsymbol{V}(oldsymbol{ abla})) = orm{oldsymbol{U} - oldsymbol{V}(oldsymbol{ abla})}^2$, with a bounded similarity reward $R^{UC} = 1 - an^{-1}(D(oldsymbol{U}, oldsymbol{V}(oldsymbol{ abla})))$. Contributions: formulate SP and UC objectives, define qcd-gym as an MDP, and benchmark model-free RL algorithms (A2C, PPO, TD3, SAC) against GA and Random, revealing both potential and current RL challenges in QCD. Findings show RL can learn nontrivial circuits and SAC often performs best, but robust, hardware-aware scaling requires improved reward shaping, partial observability handling, and integration with hardware constraints.
Abstract
Quantum computing (QC) in the current NISQ era is still limited in size and precision. Hybrid applications mitigating those shortcomings are prevalent to gain early insight and advantages. Hybrid quantum machine learning (QML) comprises both the application of QC to improve machine learning (ML) and ML to improve QC architectures. This work considers the latter, leveraging reinforcement learning (RL) to improve quantum circuit design (QCD), which we formalize by a set of generic objectives. Furthermore, we propose qcd-gym, a concrete framework formalized as a Markov decision process, to enable learning policies capable of controlling a universal set of continuously parameterized quantum gates. Finally, we provide benchmark comparisons to assess the shortcomings and strengths of current state-of-the-art RL algorithms.
