Finite-Time Logarithmic Bayes Regret Upper Bounds

Alexia Atsidakou; Branislav Kveton; Sumeet Katariya; Constantine Caramanis; Sujay Sanghavi

Finite-Time Logarithmic Bayes Regret Upper Bounds

Alexia Atsidakou, Branislav Kveton, Sumeet Katariya, Constantine Caramanis, Sujay Sanghavi

TL;DR

The paper addresses finite-time logarithmic Bayes regret for Bayesian bandits and introduces BayesUCB, a Bayesian UCB algorithm. It provides two main bounds, $O(c_\Delta \log n)$ and $O(c_h \log^2 n)$, valid across Gaussian, Bernoulli, and linear-Gaussian settings, with the latter matching Lai's lower bound asymptotically. The analysis first bounds per-instance regret (as in frequentist analyses) and then integrates over the random gaps, leveraging biased Bayesian confidence intervals to achieve logarithmic growth in horizon $n$. Experiments in Gaussian and linear bandits illustrate the advantage of incorporating prior information, showing improvements over traditional $\tilde{O}(\sqrt{n})$ Bayes regret bounds and demonstrating practical impact for prior-aware decision making.

Abstract

We derive the first finite-time logarithmic Bayes regret upper bounds for Bayesian bandits. In a multi-armed bandit, we obtain $O(c_Δ\log n)$ and $O(c_h \log^2 n)$ upper bounds for an upper confidence bound algorithm, where $c_h$ and $c_Δ$ are constants depending on the prior distribution and the gaps of bandit instances sampled from it, respectively. The latter bound asymptotically matches the lower bound of Lai (1987). Our proofs are a major technical departure from prior works, while being simple and general. To show the generality of our techniques, we apply them to linear bandits. Our results provide insights on the value of prior in the Bayesian setting, both in the objective and as a side information given to the learner. They significantly improve upon existing $\tilde{O}(\sqrt{n})$ bounds, which have become standard in the literature despite the logarithmic lower bound of Lai (1987).

Finite-Time Logarithmic Bayes Regret Upper Bounds

TL;DR

The paper addresses finite-time logarithmic Bayes regret for Bayesian bandits and introduces BayesUCB, a Bayesian UCB algorithm. It provides two main bounds,

and

, valid across Gaussian, Bernoulli, and linear-Gaussian settings, with the latter matching Lai's lower bound asymptotically. The analysis first bounds per-instance regret (as in frequentist analyses) and then integrates over the random gaps, leveraging biased Bayesian confidence intervals to achieve logarithmic growth in horizon

. Experiments in Gaussian and linear bandits illustrate the advantage of incorporating prior information, showing improvements over traditional

Bayes regret bounds and demonstrating practical impact for prior-aware decision making.

Abstract

We derive the first finite-time logarithmic Bayes regret upper bounds for Bayesian bandits. In a multi-armed bandit, we obtain

and

upper bounds for an upper confidence bound algorithm, where

and

are constants depending on the prior distribution and the gaps of bandit instances sampled from it, respectively. The latter bound asymptotically matches the lower bound of Lai (1987). Our proofs are a major technical departure from prior works, while being simple and general. To show the generality of our techniques, we apply them to linear bandits. Our results provide insights on the value of prior in the Bayesian setting, both in the objective and as a side information given to the learner. They significantly improve upon existing

bounds, which have become standard in the literature despite the logarithmic lower bound of Lai (1987).

Paper Structure (30 sections, 11 theorems, 74 equations, 3 figures, 1 algorithm)

This paper contains 30 sections, 11 theorems, 74 equations, 3 figures, 1 algorithm.

Introduction
Setting
Algorithm
Gaussian Bandit
Bernoulli Bandit
Linear Bandit with Gaussian Rewards
Logarithmic Bayes Regret Upper Bounds
$\tt BayesUCB$ in Gaussian Bandit
$\tt UCB1$ in Gaussian Bandit
$\tt BayesUCB$ in Bernoulli Bandit
$\tt BayesUCB$ in Linear Bandit
Comparison to Prior Works
Matching Lower Bound
Prior Bayes Regret Upper Bounds
Technical Novelty
...and 15 more sections

Key Result

Theorem 1

For any $\varepsilon > 0$ and $\delta \in (0, 1)$, the $n$-round Bayes regret of $\tt BayesUCB$ in a $K$-armed Gaussian bandit is bounded as where $C = \varepsilon n + 2 (\sqrt{2 \log(1 / \delta)} + 2 K) \sigma_0 K n \delta$ is a low-order term.

Figures (3)

Figure 1: Gaussian bandit as (a) the prior width $\sigma_0$ and (b) the prior gap $\Delta_0$ change.
Figure 2: Linear bandit as (a) the prior width $\sigma_0$ and (b) the prior gap $\Delta_0$ change.
Figure 3: The difference in regret of $\tt UCB1$ and $\tt BayesUCB$ on $81$ Bayesian bandit instances, sorted by the difference. In plot (a), the noise is Gaussian $\mathcal{N}(0, \sigma^2)$. In plot (b), the noise is $\sigma$ with probability $0.5$ and $- \sigma$ otherwise.

Theorems & Definitions (16)

Theorem 1
Corollary 2
Lemma 3
Theorem 4
Theorem 5
Theorem 6
Lemma 7
proof
Lemma 8
proof
...and 6 more

Finite-Time Logarithmic Bayes Regret Upper Bounds

TL;DR

Abstract

Finite-Time Logarithmic Bayes Regret Upper Bounds

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (3)

Theorems & Definitions (16)