Quantum Boltzmann Machines for Sample-Efficient Reinforcement Learning
Thore Gerlach, Michael Schenk, Verena Kain
TL;DR
This work tackles sample efficiency in continuous-action reinforcement learning by introducing Continuous Semi Quantum Boltzmann Machines (CSQBMs), a hybrid quantum–classical energy-based framework that uses an exponential-family prior on visible units together with quantum hidden units to dramatically reduce qubit requirements while retaining expressiveness. It shows that gradients with respect to continuous inputs can be computed analytically, enabling direct integration into Actor–Critic schemes, and replaces the difficult global maximization in $Q$-learning with sampling from the CSQBM distribution to perform continuous $Q$-learning. Theoretical contributions establish a principled CSQBM formulation with tractable gradient propagation through visible units and a Gibbs-sampling-based inference mechanism that yields action samples via alternating $p(oldsymbol{a}|oldsymbol{s},oldsymbol{h})$ and $p(oldsymbol{h}|oldsymbol{s},oldsymbol{a})$ under efficiently preparable Gibbs states. The proposed approach promises improved sample efficiency and expressiveness for challenging continuous control tasks, with potential applications to precision physics settings such as beamline control at CERN.
Abstract
We introduce theoretically grounded Continuous Semi-Quantum Boltzmann Machines (CSQBMs) that supports continuous-action reinforcement learning. By combining exponential-family priors over visible units with quantum Boltzmann distributions over hidden units, CSQBMs yield a hybrid quantum-classical model that reduces qubit requirements while retaining strong expressiveness. Crucially, gradients with respect to continuous variables can be computed analytically, enabling direct integration into Actor-Critic algorithms. Building on this, we propose a continuous Q-learning framework that replaces global maximization by efficient sampling from the CSQBM distribution, thereby overcoming instability issues in continuous control.
