BCQQ: Batch-Constraint Quantum Q-Learning with Cyclic Data Re-uploading

Maniraman Periyasamy; Marc Hölle; Marco Wiedmann; Daniel D. Scherer; Axel Plinge; Christopher Mutschler

BCQQ: Batch-Constraint Quantum Q-Learning with Cyclic Data Re-uploading

Maniraman Periyasamy, Marc Hölle, Marco Wiedmann, Daniel D. Scherer, Axel Plinge, Christopher Mutschler

TL;DR

This paper proposes a batch RL algorithm that utilizes variational quantum circuits (VQCs) as function approximators within the discrete batch-constraint deep Q-learning (BCQ) algorithm and introduces a novel data re-uploading scheme by cyclically shifting the order of input variables in the data encoding layers.

Abstract

Deep reinforcement learning (DRL) often requires a large number of data and environment interactions, making the training process time-consuming. This challenge is further exacerbated in the case of batch RL, where the agent is trained solely on a pre-collected dataset without environment interactions. Recent advancements in quantum computing suggest that quantum models might require less data for training compared to classical methods. In this paper, we investigate this potential advantage by proposing a batch RL algorithm that utilizes VQC as function approximators within the discrete batch-constraint deep Q-learning (BCQ) algorithm. Additionally, we introduce a novel data re-uploading scheme by cyclically shifting the order of input variables in the data encoding layers. We evaluate the efficiency of our algorithm on the OpenAI CartPole environment and compare its performance to the classical neural network-based discrete BCQ.

BCQQ: Batch-Constraint Quantum Q-Learning with Cyclic Data Re-uploading

TL;DR

Abstract

Paper Structure (26 sections, 13 equations, 7 figures, 1 table, 1 algorithm)

This paper contains 26 sections, 13 equations, 7 figures, 1 table, 1 algorithm.

Introduction
Theoretical Background
General Framework of Reinforcement Learning
Online vs. Batch and On-Policy vs. Off-Policy RL
Deep Q-Learning
Variational Quantum Circuits for RL
Efficient Gradient Estimation on Quantum Devices
Related Work
Quantum Reinforcement Learning
Batch Reinforcement Learning
Batch-Constraint Deep Q-Learning
BCQQ
RL Environment and Offline Data Collection
Variational Quantum Circuit
Data Re-Uploading
...and 11 more sections

Figures (7)

Figure 1: The that is used as the function approximator for the algorithm. Note: Each $\vec{\theta}$ block represents the repetition of the variational layer ansatz with different trainable parameters.
Figure 2: Quantum agent with standard data re-uploading strategy
Figure 3: Quantum agent with cyclic data re-uploading strategy
Figure 4: Figure (a) shows the eigenvalue spectrum of average for the classical model plotted as a histogram with normalized counts. Figure (b) shows the eigenvalue spectrum of average for quantum models plotted as a histogram with normalized counts. Figure (c) shows the effective dimension results for both classical and quantum models. The is calculated using 500 data points sampled from the CartPole-v1 states and 100 random parameter sets.
Figure 5: Figure (a), (b), and (c) shows the learning curves of the quantum agent with cyclic data re-uploading strategy and different classical agents trained on partial noisy trajectories of length 25, 50, and 100 respectively. The results shown are averaged over 3 training runs with each evaluation consisting of rewards averaged over 10 random seeds.
...and 2 more figures

BCQQ: Batch-Constraint Quantum Q-Learning with Cyclic Data Re-uploading

TL;DR

Abstract

BCQQ: Batch-Constraint Quantum Q-Learning with Cyclic Data Re-uploading

Authors

TL;DR

Abstract

Table of Contents

Figures (7)