Table of Contents
Fetching ...

Quantum Boltzmann Machines for Sample-Efficient Reinforcement Learning

Thore Gerlach, Michael Schenk, Verena Kain

TL;DR

This work tackles sample efficiency in continuous-action reinforcement learning by introducing Continuous Semi Quantum Boltzmann Machines (CSQBMs), a hybrid quantum–classical energy-based framework that uses an exponential-family prior on visible units together with quantum hidden units to dramatically reduce qubit requirements while retaining expressiveness. It shows that gradients with respect to continuous inputs can be computed analytically, enabling direct integration into Actor–Critic schemes, and replaces the difficult global maximization in $Q$-learning with sampling from the CSQBM distribution to perform continuous $Q$-learning. Theoretical contributions establish a principled CSQBM formulation with tractable gradient propagation through visible units and a Gibbs-sampling-based inference mechanism that yields action samples via alternating $p(oldsymbol{a}|oldsymbol{s},oldsymbol{h})$ and $p(oldsymbol{h}|oldsymbol{s},oldsymbol{a})$ under efficiently preparable Gibbs states. The proposed approach promises improved sample efficiency and expressiveness for challenging continuous control tasks, with potential applications to precision physics settings such as beamline control at CERN.

Abstract

We introduce theoretically grounded Continuous Semi-Quantum Boltzmann Machines (CSQBMs) that supports continuous-action reinforcement learning. By combining exponential-family priors over visible units with quantum Boltzmann distributions over hidden units, CSQBMs yield a hybrid quantum-classical model that reduces qubit requirements while retaining strong expressiveness. Crucially, gradients with respect to continuous variables can be computed analytically, enabling direct integration into Actor-Critic algorithms. Building on this, we propose a continuous Q-learning framework that replaces global maximization by efficient sampling from the CSQBM distribution, thereby overcoming instability issues in continuous control.

Quantum Boltzmann Machines for Sample-Efficient Reinforcement Learning

TL;DR

This work tackles sample efficiency in continuous-action reinforcement learning by introducing Continuous Semi Quantum Boltzmann Machines (CSQBMs), a hybrid quantum–classical energy-based framework that uses an exponential-family prior on visible units together with quantum hidden units to dramatically reduce qubit requirements while retaining expressiveness. It shows that gradients with respect to continuous inputs can be computed analytically, enabling direct integration into Actor–Critic schemes, and replaces the difficult global maximization in -learning with sampling from the CSQBM distribution to perform continuous -learning. Theoretical contributions establish a principled CSQBM formulation with tractable gradient propagation through visible units and a Gibbs-sampling-based inference mechanism that yields action samples via alternating and under efficiently preparable Gibbs states. The proposed approach promises improved sample efficiency and expressiveness for challenging continuous control tasks, with potential applications to precision physics settings such as beamline control at CERN.

Abstract

We introduce theoretically grounded Continuous Semi-Quantum Boltzmann Machines (CSQBMs) that supports continuous-action reinforcement learning. By combining exponential-family priors over visible units with quantum Boltzmann distributions over hidden units, CSQBMs yield a hybrid quantum-classical model that reduces qubit requirements while retaining strong expressiveness. Crucially, gradients with respect to continuous variables can be computed analytically, enabling direct integration into Actor-Critic algorithms. Building on this, we propose a continuous Q-learning framework that replaces global maximization by efficient sampling from the CSQBM distribution, thereby overcoming instability issues in continuous control.

Paper Structure

This paper contains 12 sections, 2 theorems, 13 equations, 1 figure.

Key Result

Theorem 1

With $\boldsymbol{H}'(\boldsymbol{v})=\boldsymbol{H}^{v h}(\boldsymbol{v})+\boldsymbol{H}^{h}+\boldsymbol{H}^{hh}$ and $\rho'_{\boldsymbol{v}}=e^{-\beta\boldsymbol{H}'(\boldsymbol{v})}/\mathop{\mathrm{tr}}\nolimits\left[e^{-\beta\boldsymbol{H}'(\boldsymbol{v})}\right]$, it holds

Figures (1)

  • Figure 1: Illustration of our proposed CSQBM and how it enables continuous-action $Q$-learning. We overcome the limitations of current AC approaches using SQBMs (a) by introducing theoretically sound continuous SQBMs (CSQBMs), which utilize exponential-family priors (e.g. Gaussian) (b). The best action is obtained by sampling from the hybrid quantum–classical distribution.

Theorems & Definitions (2)

  • Theorem 1
  • Theorem 2