A Survey on Quantum Reinforcement Learning

Nico Meyer; Christian Ufrecht; Maniraman Periyasamy; Daniel D. Scherer; Axel Plinge; Christopher Mutschler

A Survey on Quantum Reinforcement Learning

Nico Meyer, Christian Ufrecht, Maniraman Periyasamy, Daniel D. Scherer, Axel Plinge, Christopher Mutschler

TL;DR

This survey maps the landscape of quantum reinforcement learning, spanning quantum-inspired methods, variational quantum circuits, and fully quantum algorithms that leverage subroutines like amplitude estimation and Grover search. It highlights how near-term devices enable hybrid quantum-classical RL with variational function approximators, while also detailing fully quantum approaches that promise provable advantages in restricted settings. The key contributions include a taxonomy of algorithm classes, critical assessments of demonstrated results, and a synthesis of architectural and data-encoding considerations that influence trainability and scalability. The study underscores that, despite intriguing progress, broad quantum advantage remains elusive on current hardware, and it outlines concrete directions—architecture design, offline and multi-agent RL, and quantum-accessible environments—for achieving practical impact in the future.

Abstract

Quantum reinforcement learning is an emerging field at the intersection of quantum computing and machine learning. While we intend to provide a broad overview of the literature on quantum reinforcement learning - our interpretation of this term will be clarified below - we put particular emphasis on recent developments. With a focus on already available noisy intermediate-scale quantum devices, these include variational quantum circuits acting as function approximators in an otherwise classical reinforcement learning setting. In addition, we survey quantum reinforcement learning algorithms based on future fault-tolerant hardware, some of which come with a provable quantum advantage. We provide both a birds-eye-view of the field, as well as summaries and reviews for selected parts of the literature.

A Survey on Quantum Reinforcement Learning

TL;DR

Abstract

Paper Structure (60 sections, 37 equations, 18 figures, 46 tables)

This paper contains 60 sections, 37 equations, 18 figures, 46 tables.

Introduction and Overview
Classical Reinforcement Learning
Reinforcement Learning as a Markov Decision Process
Long Term Reward as Objective
Policy, Value Functions and Optimality
Solving and Approximating the Bellman Equation
The Quantum Computing Paradigm
Single and Multi-Qubit Systems
Evolution of Closed Quantum Systems
Extracting Classical Information via Measurements
Quantum Machine Learning with Variational Quantum Circuits
Quantum Reinforcement Learning Algorithms
Quantum-Inspired Reinforcement Learning based on Amplitude Amplification
Quantum reinforcement learning, Dong et al. (2008) and related work
Quantum Reinforcement Learning with Variational Quantum Circuits
...and 45 more sections

Figures (18)

Figure 1: A possible classification matrix for algorithms, where we took into account only those variants of which we focus on in \ref{['sec:QRLAlgs']}. The algorithm classes are ordered according to their degree of quantum-classical hybridization, ranging from purely classical to purely quantum. A more detailed review of the $22$ selected works on -algorithms can be found in \ref{['subsec:quantum_inspired']}. -based approaches are summarized in quite some detail in \ref{['subsec:VQC_based']} -- comprising of $68$ papers. -algorithms employing post- quantum algorithms as subroutines or even fully quantum approaches to are described in \ref{['subsec:projective_simulaton']}, \ref{['subsec:boltzman_machines']}, \ref{['subsec:QPI']} and \ref{['subsec:Oracles']}, based on $30$ selected manuscripts. The dashed vertical line between classical and compute resources indicates that presently it is unclear whether with -compatible algorithms offers robust quantum advantage on a broad range of learning problems. The solid vertical line distinguishes post- algorithms from both classical and -compatible algorithms, as they typically come with guaranteed quantum advantage (at least relative to their classical counterparts).
Figure 2: Interaction between agent and environment for one timestep of a task.
Figure 3: Bloch sphere representation of a $1$-qubit state.
Figure 4: Circuit symbols of various quantum operators (gates).
Figure 5: Variational quantum circuit consisting of feature map, variational layer, and measurement.
...and 13 more figures

A Survey on Quantum Reinforcement Learning

TL;DR

Abstract

A Survey on Quantum Reinforcement Learning

Authors

TL;DR

Abstract

Table of Contents

Figures (18)