Quantum Lipschitz Bandits
Bongsoo Yi, Yue Kang, Yao Li
TL;DR
This work tackles Lipschitz bandits with continuous arm spaces by introducing the first quantum algorithms, Q-LAE and Q-Zooming, that leverage quantum Monte Carlo to accelerate mean estimation over adaptive discretizations. By rethinking the zooming-dimension framework and employing a quantum elimination or stage-based zooming strategy, the authors achieve regret bounds of order O(T^{d_z/(d_z+1)}) under standard noise models, improving upon classical bounds that scale as roughly O(T^{(d_z+1)/(d_z+2)}). Empirical results on representative Lipschitz problems show superior performance of the quantum methods over classical Zooming, with Q-LAE often offering robust practical performance due to its aggressive elimination of suboptimal regions. The paper demonstrates the potential of quantum techniques to enhance learning in continuous-action bandits, offering new directions for combining quantum subroutines with structured exploration strategies in online learning.
Abstract
The Lipschitz bandit is a key variant of stochastic bandit problems where the expected reward function satisfies a Lipschitz condition with respect to an arm metric space. With its wide-ranging practical applications, various Lipschitz bandit algorithms have been developed, achieving the cumulative regret lower bound of order $\tilde O(T^{(d_z+1)/(d_z+2)})$ over time horizon $T$. Motivated by recent advancements in quantum computing and the demonstrated success of quantum Monte Carlo in simpler bandit settings, we introduce the first quantum Lipschitz bandit algorithms to address the challenges of continuous action spaces and non-linear reward functions. Specifically, we first leverage the elimination-based framework to propose an efficient quantum Lipschitz bandit algorithm named Q-LAE. Next, we present novel modifications to the classical Zooming algorithm, which results in a simple quantum Lipschitz bandit method, Q-Zooming. Both algorithms exploit the computational power of quantum methods to achieve an improved regret bound of $\tilde O(T^{d_z/(d_z+1)})$. Comprehensive experiments further validate our improved theoretical findings, demonstrating superior empirical performance compared to existing Lipschitz bandit methods.
