Warm-Start Variational Quantum Policy Iteration
Nico Meyer, Jakob Murauer, Alexander Popov, Christian Ufrecht, Axel Plinge, Christopher Mutschler, Daniel D. Scherer
TL;DR
This paper tackles speeding up reinforcement-learning policy iteration by embedding a variational, quantum-enhanced linear-system solver into the policy evaluation step. The proposed VarQPI framework, augmented with warm-start initialization (WS-VarQPI), leverages a variational LSE solver and $\ell_{\infty}$-tomography for efficient quantum-assisted policy evaluation and classical policy improvement. Empirical evidence on FrozenLake environments shows robust performance, with WS-VarQPI achieving notable reductions in training steps and enabling up-scaling to larger problems (e.g., $256\times256$ linear systems) while maintaining ground-truth accuracy. The work analyzes the sparsity and conditioning of typical RL-induced systems, argues for practical quantum advantage under realistic constraints, and highlights hardware-related bottlenecks and directions for future validation on quantum devices.
Abstract
Reinforcement learning is a powerful framework aiming to determine optimal behavior in highly complex decision-making scenarios. This objective can be achieved using policy iteration, which requires to solve a typically large linear system of equations. We propose the variational quantum policy iteration (VarQPI) algorithm, realizing this step with a NISQ-compatible quantum-enhanced subroutine. Its scalability is supported by an analysis of the structure of generic reinforcement learning environments, laying the foundation for potential quantum advantage with utility-scale quantum computers. Furthermore, we introduce the warm-start initialization variant (WS-VarQPI) that significantly reduces resource overhead. The algorithm solves a large FrozenLake environment with an underlying 256x256-dimensional linear system, indicating its practical robustness.
