A quantum-classical reinforcement learning model to play Atari games
Dominik Freinberger, Julian Lemmel, Radu Grosu, Sofiene Jerbi
TL;DR
This work investigates a hybrid quantum-classical reinforcement learning framework for high-dimensional observation spaces, evaluated on Atari games Pong and Breakout. It combines classical feature extraction with a parametrized quantum circuit (PQC) that encodes latent features and outputs Q-values via local Pauli-$Z$ measurements, with a linear post-processing layer guiding learning through approximate Q-learning with replay and a target network. The study shows that the hybrid model can learn Pong and approach the classical baseline in Breakout, and it analyzes how design choices—especially latent-space dimensionality and reward scaling—shape performance, offering guidance for fair benchmarking of quantum components. While no quantum advantage is demonstrated, the results advance understanding of quantum-classical interplay in RL and point to directions such as robustness to noise and task domains where quantum effects may be more beneficial.
Abstract
Recent advances in reinforcement learning have demonstrated the potential of quantum learning models based on parametrized quantum circuits as an alternative to deep learning models. On the one hand, these findings have shown the ultimate exponential speed-ups in learning that full-blown quantum models can offer in certain -- artificially constructed -- environments. On the other hand, they have demonstrated the ability of experimentally accessible PQCs to solve OpenAI Gym benchmarking tasks. However, it remains an open question whether these near-term QRL techniques can be successfully applied to more complex problems exhibiting high-dimensional observation spaces. In this work, we bridge this gap and present a hybrid model combining a PQC with classical feature encoding and post-processing layers that is capable of tackling Atari games. A classical model, subjected to architectural restrictions similar to those present in the hybrid model is constructed to serve as a reference. Our numerical investigation demonstrates that the proposed hybrid model is capable of solving the Pong environment and achieving scores comparable to the classical reference in Breakout. Furthermore, our findings shed light on important hyperparameter settings and design choices that impact the interplay of the quantum and classical components. This work contributes to the understanding of near-term quantum learning models and makes an important step towards their deployment in real-world RL scenarios.
