Bayesian Bandit Algorithms with Approximate Inference in Stochastic Linear Bandits
Ziyi Huang, Henry Lam, Haofeng Zhang
TL;DR
The paper addresses regret analysis for Bayesian bandits with approximate inference in stochastic linear contextual bandits. It develops a framework based on $\alpha$-divergences to bound inference error between exact posteriors and approximate posteriors, showing LinTS and LinBUCB retain regret rates under two bounded divergences. By introducing a notion of well-behaved distributions, LinBUCB can achieve the minimax rate $O(d\sqrt{T})$, closing the gap with LinTS. Experiments corroborate the theory and demonstrate computational gains from approximate inference without sacrificing sublinear regret.
Abstract
Bayesian bandit algorithms with approximate Bayesian inference have been widely used in real-world applications. Despite the superior practical performance, their theoretical justification is less investigated in the literature, especially for contextual bandit problems. To fill this gap, we propose a theoretical framework to analyze the impact of approximate inference in stochastic linear bandits and conduct frequentist regret analysis on two Bayesian bandit algorithms, Linear Thompson Sampling (LinTS) and the extension of Bayesian Upper Confidence Bound, namely Linear Bayesian Upper Confidence Bound (LinBUCB). We demonstrate that when applied in approximate inference settings, LinTS and LinBUCB can universally preserve their original rates of regret upper bound but with a sacrifice of larger constant terms. These results hold for general Bayesian inference approaches, assuming the inference error measured by two different $α$-divergences is bounded. Additionally, by introducing a new definition of well-behaved distributions, we show that LinBUCB expedites the regret rate of LinTS from $\tilde{O}(d^{3/2}\sqrt{T})$ to $\tilde{O}(d\sqrt{T})$, matching the minimax optimal rate. To our knowledge, this work provides the first regret bounds in the setting of stochastic linear bandits with bounded approximate inference errors.
