Table of Contents
Fetching ...

BCQQ: Batch-Constraint Quantum Q-Learning with Cyclic Data Re-uploading

Maniraman Periyasamy, Marc Hölle, Marco Wiedmann, Daniel D. Scherer, Axel Plinge, Christopher Mutschler

TL;DR

This paper proposes a batch RL algorithm that utilizes variational quantum circuits (VQCs) as function approximators within the discrete batch-constraint deep Q-learning (BCQ) algorithm and introduces a novel data re-uploading scheme by cyclically shifting the order of input variables in the data encoding layers.

Abstract

Deep reinforcement learning (DRL) often requires a large number of data and environment interactions, making the training process time-consuming. This challenge is further exacerbated in the case of batch RL, where the agent is trained solely on a pre-collected dataset without environment interactions. Recent advancements in quantum computing suggest that quantum models might require less data for training compared to classical methods. In this paper, we investigate this potential advantage by proposing a batch RL algorithm that utilizes VQC as function approximators within the discrete batch-constraint deep Q-learning (BCQ) algorithm. Additionally, we introduce a novel data re-uploading scheme by cyclically shifting the order of input variables in the data encoding layers. We evaluate the efficiency of our algorithm on the OpenAI CartPole environment and compare its performance to the classical neural network-based discrete BCQ.

BCQQ: Batch-Constraint Quantum Q-Learning with Cyclic Data Re-uploading

TL;DR

This paper proposes a batch RL algorithm that utilizes variational quantum circuits (VQCs) as function approximators within the discrete batch-constraint deep Q-learning (BCQ) algorithm and introduces a novel data re-uploading scheme by cyclically shifting the order of input variables in the data encoding layers.

Abstract

Deep reinforcement learning (DRL) often requires a large number of data and environment interactions, making the training process time-consuming. This challenge is further exacerbated in the case of batch RL, where the agent is trained solely on a pre-collected dataset without environment interactions. Recent advancements in quantum computing suggest that quantum models might require less data for training compared to classical methods. In this paper, we investigate this potential advantage by proposing a batch RL algorithm that utilizes VQC as function approximators within the discrete batch-constraint deep Q-learning (BCQ) algorithm. Additionally, we introduce a novel data re-uploading scheme by cyclically shifting the order of input variables in the data encoding layers. We evaluate the efficiency of our algorithm on the OpenAI CartPole environment and compare its performance to the classical neural network-based discrete BCQ.
Paper Structure (26 sections, 13 equations, 7 figures, 1 table, 1 algorithm)

This paper contains 26 sections, 13 equations, 7 figures, 1 table, 1 algorithm.

Figures (7)

  • Figure 1: The that is used as the function approximator for the algorithm. Note: Each $\vec{\theta}$ block represents the repetition of the variational layer ansatz with different trainable parameters.
  • Figure 2: Quantum agent with standard data re-uploading strategy
  • Figure 3: Quantum agent with cyclic data re-uploading strategy
  • Figure 4: Figure (a) shows the eigenvalue spectrum of average for the classical model plotted as a histogram with normalized counts. Figure (b) shows the eigenvalue spectrum of average for quantum models plotted as a histogram with normalized counts. Figure (c) shows the effective dimension results for both classical and quantum models. The is calculated using 500 data points sampled from the CartPole-v1 states and 100 random parameter sets.
  • Figure 5: Figure (a), (b), and (c) shows the learning curves of the quantum agent with cyclic data re-uploading strategy and different classical agents trained on partial noisy trajectories of length 25, 50, and 100 respectively. The results shown are averaged over 3 training runs with each evaluation consisting of rewards averaged over 10 random seeds.
  • ...and 2 more figures