Table of Contents
Fetching ...

Bayesian Bandit Algorithms with Approximate Inference in Stochastic Linear Bandits

Ziyi Huang, Henry Lam, Haofeng Zhang

TL;DR

The paper addresses regret analysis for Bayesian bandits with approximate inference in stochastic linear contextual bandits. It develops a framework based on $\alpha$-divergences to bound inference error between exact posteriors and approximate posteriors, showing LinTS and LinBUCB retain regret rates under two bounded divergences. By introducing a notion of well-behaved distributions, LinBUCB can achieve the minimax rate $O(d\sqrt{T})$, closing the gap with LinTS. Experiments corroborate the theory and demonstrate computational gains from approximate inference without sacrificing sublinear regret.

Abstract

Bayesian bandit algorithms with approximate Bayesian inference have been widely used in real-world applications. Despite the superior practical performance, their theoretical justification is less investigated in the literature, especially for contextual bandit problems. To fill this gap, we propose a theoretical framework to analyze the impact of approximate inference in stochastic linear bandits and conduct frequentist regret analysis on two Bayesian bandit algorithms, Linear Thompson Sampling (LinTS) and the extension of Bayesian Upper Confidence Bound, namely Linear Bayesian Upper Confidence Bound (LinBUCB). We demonstrate that when applied in approximate inference settings, LinTS and LinBUCB can universally preserve their original rates of regret upper bound but with a sacrifice of larger constant terms. These results hold for general Bayesian inference approaches, assuming the inference error measured by two different $α$-divergences is bounded. Additionally, by introducing a new definition of well-behaved distributions, we show that LinBUCB expedites the regret rate of LinTS from $\tilde{O}(d^{3/2}\sqrt{T})$ to $\tilde{O}(d\sqrt{T})$, matching the minimax optimal rate. To our knowledge, this work provides the first regret bounds in the setting of stochastic linear bandits with bounded approximate inference errors.

Bayesian Bandit Algorithms with Approximate Inference in Stochastic Linear Bandits

TL;DR

The paper addresses regret analysis for Bayesian bandits with approximate inference in stochastic linear contextual bandits. It develops a framework based on -divergences to bound inference error between exact posteriors and approximate posteriors, showing LinTS and LinBUCB retain regret rates under two bounded divergences. By introducing a notion of well-behaved distributions, LinBUCB can achieve the minimax rate , closing the gap with LinTS. Experiments corroborate the theory and demonstrate computational gains from approximate inference without sacrificing sublinear regret.

Abstract

Bayesian bandit algorithms with approximate Bayesian inference have been widely used in real-world applications. Despite the superior practical performance, their theoretical justification is less investigated in the literature, especially for contextual bandit problems. To fill this gap, we propose a theoretical framework to analyze the impact of approximate inference in stochastic linear bandits and conduct frequentist regret analysis on two Bayesian bandit algorithms, Linear Thompson Sampling (LinTS) and the extension of Bayesian Upper Confidence Bound, namely Linear Bayesian Upper Confidence Bound (LinBUCB). We demonstrate that when applied in approximate inference settings, LinTS and LinBUCB can universally preserve their original rates of regret upper bound but with a sacrifice of larger constant terms. These results hold for general Bayesian inference approaches, assuming the inference error measured by two different -divergences is bounded. Additionally, by introducing a new definition of well-behaved distributions, we show that LinBUCB expedites the regret rate of LinTS from to , matching the minimax optimal rate. To our knowledge, this work provides the first regret bounds in the setting of stochastic linear bandits with bounded approximate inference errors.
Paper Structure (25 sections, 12 theorems, 128 equations, 4 figures, 3 tables, 2 algorithms)

This paper contains 25 sections, 12 theorems, 128 equations, 4 figures, 3 tables, 2 algorithms.

Key Result

Proposition 2.2

Suppose Assumption linearassu holds. Consider the $\mathcal{F} _t^x$ - adapted sequence $(x_1, ..., x_t)$ and the RLS estimator $\hat{\theta} _t$ defined above. For any $\delta \in (0,1)$, with probability at least $1-\delta$ (with respect to the noise $\{\xi_t\}_t$ and any source of randomization i for any $t \geq 1$ and any $x \in \mathbb{R}^d$, where Moreover, for any arbitrary sequence $(x_1,

Figures (4)

  • Figure 1: Results of LinBUCB, LinTS, LinBUCB$\_$Approximate, and LinTS$\_$Approximate under different problem settings. Results are averaged over 10 runs with shaded standard errors.
  • Figure 2: Results of LinBUCB, LinTS, LinBUCB$\_$Approximate, and LinTS$\_$Approximate under different feature dimensions $d$ on Problem Setting $P3$ with the fixed number of arms $K = 10$. Results are averaged over 10 runs with shaded standard errors.
  • Figure 3: Results of LinBUCB, LinTS, LinBUCB$\_$Approximate, and LinTS$\_$Approximate under different number of arms $K$ on Problem Setting $P3$ with the fixed feature dimension $d = 20$. Results are averaged over 10 runs with shaded standard errors.
  • Figure 4: Sensitivity analysis of different quantile $\gamma$ in LinBUCB (a) and LinBUCB$\_$Approximate (b) on Problem Setting $P3$ ($d = 20$, $K = 10$, $T = 1000$). Results are averaged over 5 runs with shaded standard errors.

Theorems & Definitions (22)

  • Proposition 2.2: abbasi2011improvedabeille2017linear
  • Proposition 2.4: abeille2017linear
  • Proposition 2.6
  • Definition 3.1
  • Theorem 3.3: Regret of LinTS
  • Theorem 3.4: Regret of LinBUCB without approximate inference
  • Theorem 3.5: Regret of LinBUCB with approximate inference
  • Theorem 3.7
  • Theorem 3.8
  • proof : Proof of Proposition \ref{['prop_D2']}
  • ...and 12 more