Bayesian Bandit Algorithms with Approximate Inference in Stochastic Linear Bandits

Ziyi Huang; Henry Lam; Haofeng Zhang

Bayesian Bandit Algorithms with Approximate Inference in Stochastic Linear Bandits

Ziyi Huang, Henry Lam, Haofeng Zhang

TL;DR

The paper addresses regret analysis for Bayesian bandits with approximate inference in stochastic linear contextual bandits. It develops a framework based on $\alpha$-divergences to bound inference error between exact posteriors and approximate posteriors, showing LinTS and LinBUCB retain regret rates under two bounded divergences. By introducing a notion of well-behaved distributions, LinBUCB can achieve the minimax rate $O(d\sqrt{T})$, closing the gap with LinTS. Experiments corroborate the theory and demonstrate computational gains from approximate inference without sacrificing sublinear regret.

Abstract

Bayesian bandit algorithms with approximate Bayesian inference have been widely used in real-world applications. Despite the superior practical performance, their theoretical justification is less investigated in the literature, especially for contextual bandit problems. To fill this gap, we propose a theoretical framework to analyze the impact of approximate inference in stochastic linear bandits and conduct frequentist regret analysis on two Bayesian bandit algorithms, Linear Thompson Sampling (LinTS) and the extension of Bayesian Upper Confidence Bound, namely Linear Bayesian Upper Confidence Bound (LinBUCB). We demonstrate that when applied in approximate inference settings, LinTS and LinBUCB can universally preserve their original rates of regret upper bound but with a sacrifice of larger constant terms. These results hold for general Bayesian inference approaches, assuming the inference error measured by two different $α$-divergences is bounded. Additionally, by introducing a new definition of well-behaved distributions, we show that LinBUCB expedites the regret rate of LinTS from $\tilde{O}(d^{3/2}\sqrt{T})$ to $\tilde{O}(d\sqrt{T})$, matching the minimax optimal rate. To our knowledge, this work provides the first regret bounds in the setting of stochastic linear bandits with bounded approximate inference errors.

Bayesian Bandit Algorithms with Approximate Inference in Stochastic Linear Bandits

TL;DR

The paper addresses regret analysis for Bayesian bandits with approximate inference in stochastic linear contextual bandits. It develops a framework based on

-divergences to bound inference error between exact posteriors and approximate posteriors, showing LinTS and LinBUCB retain regret rates under two bounded divergences. By introducing a notion of well-behaved distributions, LinBUCB can achieve the minimax rate

, closing the gap with LinTS. Experiments corroborate the theory and demonstrate computational gains from approximate inference without sacrificing sublinear regret.

Abstract

-divergences is bounded. Additionally, by introducing a new definition of well-behaved distributions, we show that LinBUCB expedites the regret rate of LinTS from

, matching the minimax optimal rate. To our knowledge, this work provides the first regret bounds in the setting of stochastic linear bandits with bounded approximate inference errors.

Paper Structure (25 sections, 12 theorems, 128 equations, 4 figures, 3 tables, 2 algorithms)

This paper contains 25 sections, 12 theorems, 128 equations, 4 figures, 3 tables, 2 algorithms.

Introduction
Methodology
LinTS with Approximate Inference
LinBUCB with Approximate Inference
Regret Analysis
The Alpha Divergence
Finite-Time Regret Bound of LinTS
Finite-Time Regret Bound of LinBUCB
Negative Results
Experiments
Conclusions
Proof
Proofs of Results in Section \ref{['sec:LinBUCB']}
Proofs of Results in Section \ref{['sec:regretLinTS']}
Proofs of Results in Section \ref{['sec:regretLinBUCB']}
...and 10 more sections

Key Result

Proposition 2.2

Suppose Assumption linearassu holds. Consider the $\mathcal{F} _t^x$ - adapted sequence $(x_1, ..., x_t)$ and the RLS estimator $\hat{\theta} _t$ defined above. For any $\delta \in (0,1)$, with probability at least $1-\delta$ (with respect to the noise $\{\xi_t\}_t$ and any source of randomization i for any $t \geq 1$ and any $x \in \mathbb{R}^d$, where Moreover, for any arbitrary sequence $(x_1,

Figures (4)

Figure 1: Results of LinBUCB, LinTS, LinBUCB$\_$Approximate, and LinTS$\_$Approximate under different problem settings. Results are averaged over 10 runs with shaded standard errors.
Figure 2: Results of LinBUCB, LinTS, LinBUCB$\_$Approximate, and LinTS$\_$Approximate under different feature dimensions $d$ on Problem Setting $P3$ with the fixed number of arms $K = 10$. Results are averaged over 10 runs with shaded standard errors.
Figure 3: Results of LinBUCB, LinTS, LinBUCB$\_$Approximate, and LinTS$\_$Approximate under different number of arms $K$ on Problem Setting $P3$ with the fixed feature dimension $d = 20$. Results are averaged over 10 runs with shaded standard errors.
Figure 4: Sensitivity analysis of different quantile $\gamma$ in LinBUCB (a) and LinBUCB$\_$Approximate (b) on Problem Setting $P3$ ($d = 20$, $K = 10$, $T = 1000$). Results are averaged over 5 runs with shaded standard errors.

Theorems & Definitions (22)

Proposition 2.2: abbasi2011improvedabeille2017linear
Proposition 2.4: abeille2017linear
Proposition 2.6
Definition 3.1
Theorem 3.3: Regret of LinTS
Theorem 3.4: Regret of LinBUCB without approximate inference
Theorem 3.5: Regret of LinBUCB with approximate inference
Theorem 3.7
Theorem 3.8
proof : Proof of Proposition \ref{['prop_D2']}
...and 12 more

Bayesian Bandit Algorithms with Approximate Inference in Stochastic Linear Bandits

TL;DR

Abstract

Bayesian Bandit Algorithms with Approximate Inference in Stochastic Linear Bandits

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (4)

Theorems & Definitions (22)