Table of Contents
Fetching ...

Robustness and Generalization in Quantum Reinforcement Learning via Lipschitz Regularization

Nico Meyer, Julian Berberich, Christopher Mutschler, Daniel D. Scherer

TL;DR

This paper addresses the robustness and generalization of quantum reinforcement learning by combining principles from quantum computing and control theory, and proposes a regularized version of a quantum policy gradient approach, named the RegQPG algorithm.

Abstract

Quantum machine learning leverages quantum computing to enhance accuracy and reduce model complexity compared to classical approaches, promising significant advancements in various fields. Within this domain, quantum reinforcement learning has garnered attention, often realized using variational quantum circuits to approximate the policy function. This paper addresses the robustness and generalization of quantum reinforcement learning by combining principles from quantum computing and control theory. Leveraging recent results on robust quantum machine learning, we utilize Lipschitz bounds to propose a regularized version of a quantum policy gradient approach, named the RegQPG algorithm. We show that training with RegQPG improves the robustness and generalization of the resulting policies. Furthermore, we introduce an algorithmic variant that incorporates curriculum learning, which minimizes failures during training. Our findings are validated through numerical experiments, demonstrating the practical benefits of our approach.

Robustness and Generalization in Quantum Reinforcement Learning via Lipschitz Regularization

TL;DR

This paper addresses the robustness and generalization of quantum reinforcement learning by combining principles from quantum computing and control theory, and proposes a regularized version of a quantum policy gradient approach, named the RegQPG algorithm.

Abstract

Quantum machine learning leverages quantum computing to enhance accuracy and reduce model complexity compared to classical approaches, promising significant advancements in various fields. Within this domain, quantum reinforcement learning has garnered attention, often realized using variational quantum circuits to approximate the policy function. This paper addresses the robustness and generalization of quantum reinforcement learning by combining principles from quantum computing and control theory. Leveraging recent results on robust quantum machine learning, we utilize Lipschitz bounds to propose a regularized version of a quantum policy gradient approach, named the RegQPG algorithm. We show that training with RegQPG improves the robustness and generalization of the resulting policies. Furthermore, we introduce an algorithmic variant that incorporates curriculum learning, which minimizes failures during training. Our findings are validated through numerical experiments, demonstrating the practical benefits of our approach.

Paper Structure

This paper contains 8 sections, 19 equations, 6 figures, 2 tables, 1 algorithm.

Figures (6)

  • Figure 1: The Quantum model used as the policy approximator in the proposed regqpg algorithm. The parametrized unitaries are composed of single-qubit rotations with trainable variational weights $\nu_j$ and encoding parameters $\omega_j$. This is followed by a static entanglement unitary and repeated several times. In the end, a tensored Pauli-Z observable is measured and plugged into \ref{['eq:policy_postprocess']} to approximate the policy value.
  • Figure 2: Sketch of CartPole environment, see also \ref{['tab:cartpole']} for details on observations.
  • Figure 3: Training using the regqpg algorithm on the CartPole environment. One epoch performs updates on trajectories from $10$ environment instances. The strength of Lipschitz regularization is indicated by $\lambda$. Training performance is averaged over $100$ random seeds.
  • Figure 4: Lipschitz regularization enhances robustness of policies trained with the regqpg algorithm. The models from \ref{['fig:train']} are tested on modified CartPole environments with observation perturbation, i.e. additive zero-mean Gaussian noise with increasing variance. Testing performance is averaged over 100 random seeds for each model, resulting in 10000 samples contributing to each data point.
  • Figure 5: Lipschitz regularization enhances generalization of policies trained with the regqpg algorithm. During training, all features -- cart position, cart velocity, pole angle, pole angular velocity -- are by default initialized randomly at uniform in $\left[ -0.05, 0.05 \right]$. We test on a wider range of pole angles and angular velocities, and report the rate of attraction for these configurations, i.e. the fraction of successful test runs -- see also \ref{['eq:rate_of_attraction']}. The $100$ models for each regularization rate are evaluated for $100$ runs each, i.e. $10000$ samples contribute to each data point. Regions where the policies obtained via regularized training deviate from the non-regularized baseline and their corresponding full (half) confidence intervals -- defined by the variances -- do not overlap are highlighted as significantly (slightly) better or worse, respectively.
  • ...and 1 more figures