Table of Contents
Fetching ...

On Lai's Upper Confidence Bound in Multi-Armed Bandits

Huachen Ren, Cun-Hui Zhang

TL;DR

The paper provides sharp nonasymptotic regret bounds for two Lai–type UCB policies in Gaussian K-armed bandits: (i) a UCB with a fixed exploration level $b_{T'}$ achieving a leading constant matching the Lai–Robbins lower bound, and (ii) a nonasymptotic bound for Lai's UCB index with $g(x)$ linked to $\log x$, both with explicit control of second-order terms via Brownian-boundary analyses. The authors develop a novel analytical approach that leverages boundary-crossing probabilities of random walks, cast as repeated significance tests, and nonlinear renewal theory to bound the number of suboptimal pulls. The results connect nonasymptotic guarantees with classical information-theoretic limits, demonstrating that carefully tuned exploration can attain optimal constants even for finite horizons. The work also discusses extensions to sub-Gaussian rewards and potential applications to broader sequential decision-making settings, including contextual bandits and reinforcement learning.

Abstract

In this memorial paper, we honor Tze Leung Lai's seminal contributions to the topic of multi-armed bandits, with a specific focus on his pioneering work on the upper confidence bound. We establish sharp non-asymptotic regret bounds for an upper confidence bound index with a constant level of exploration for Gaussian rewards. Furthermore, we establish a non-asymptotic regret bound for the upper confidence bound index of Lai (1987) which employs an exploration function that decreases with the sample size of the corresponding arm. The regret bounds have leading constants that match the Lai-Robbins lower bound. Our results highlight an aspect of Lai's seminal works that deserves more attention in the machine learning literature.

On Lai's Upper Confidence Bound in Multi-Armed Bandits

TL;DR

The paper provides sharp nonasymptotic regret bounds for two Lai–type UCB policies in Gaussian K-armed bandits: (i) a UCB with a fixed exploration level achieving a leading constant matching the Lai–Robbins lower bound, and (ii) a nonasymptotic bound for Lai's UCB index with linked to , both with explicit control of second-order terms via Brownian-boundary analyses. The authors develop a novel analytical approach that leverages boundary-crossing probabilities of random walks, cast as repeated significance tests, and nonlinear renewal theory to bound the number of suboptimal pulls. The results connect nonasymptotic guarantees with classical information-theoretic limits, demonstrating that carefully tuned exploration can attain optimal constants even for finite horizons. The work also discusses extensions to sub-Gaussian rewards and potential applications to broader sequential decision-making settings, including contextual bandits and reinforcement learning.

Abstract

In this memorial paper, we honor Tze Leung Lai's seminal contributions to the topic of multi-armed bandits, with a specific focus on his pioneering work on the upper confidence bound. We establish sharp non-asymptotic regret bounds for an upper confidence bound index with a constant level of exploration for Gaussian rewards. Furthermore, we establish a non-asymptotic regret bound for the upper confidence bound index of Lai (1987) which employs an exploration function that decreases with the sample size of the corresponding arm. The regret bounds have leading constants that match the Lai-Robbins lower bound. Our results highlight an aspect of Lai's seminal works that deserves more attention in the machine learning literature.
Paper Structure (17 sections, 8 theorems, 60 equations)

This paper contains 17 sections, 8 theorems, 60 equations.

Key Result

Theorem 1

Suppose the rewards from arm $a$ follow a Gaussian distribution with mean $\mu_a$ and no greater variance than $\sigma^2$ for all $a=1,\ldots,K$. Then, the regret of the UCB rule UCB-rule-T is bounded by where $R_{T'}$ is defined in regret-def and $\eta(b_{T'})$ is defined in th-Gaussian-LaiRobbins-T-1.

Theorems & Definitions (15)

  • Theorem 1
  • Remark 2
  • Corollary 3
  • Remark 4
  • Corollary 5
  • Theorem 6
  • Remark 7
  • Lemma 8
  • proof
  • Lemma 9
  • ...and 5 more