On Lai's Upper Confidence Bound in Multi-Armed Bandits

Huachen Ren; Cun-Hui Zhang

On Lai's Upper Confidence Bound in Multi-Armed Bandits

Huachen Ren, Cun-Hui Zhang

TL;DR

The paper provides sharp nonasymptotic regret bounds for two Lai–type UCB policies in Gaussian K-armed bandits: (i) a UCB with a fixed exploration level $b_{T'}$ achieving a leading constant matching the Lai–Robbins lower bound, and (ii) a nonasymptotic bound for Lai's UCB index with $g(x)$ linked to $\log x$, both with explicit control of second-order terms via Brownian-boundary analyses. The authors develop a novel analytical approach that leverages boundary-crossing probabilities of random walks, cast as repeated significance tests, and nonlinear renewal theory to bound the number of suboptimal pulls. The results connect nonasymptotic guarantees with classical information-theoretic limits, demonstrating that carefully tuned exploration can attain optimal constants even for finite horizons. The work also discusses extensions to sub-Gaussian rewards and potential applications to broader sequential decision-making settings, including contextual bandits and reinforcement learning.

Abstract

In this memorial paper, we honor Tze Leung Lai's seminal contributions to the topic of multi-armed bandits, with a specific focus on his pioneering work on the upper confidence bound. We establish sharp non-asymptotic regret bounds for an upper confidence bound index with a constant level of exploration for Gaussian rewards. Furthermore, we establish a non-asymptotic regret bound for the upper confidence bound index of Lai (1987) which employs an exploration function that decreases with the sample size of the corresponding arm. The regret bounds have leading constants that match the Lai-Robbins lower bound. Our results highlight an aspect of Lai's seminal works that deserves more attention in the machine learning literature.

On Lai's Upper Confidence Bound in Multi-Armed Bandits

TL;DR

The paper provides sharp nonasymptotic regret bounds for two Lai–type UCB policies in Gaussian K-armed bandits: (i) a UCB with a fixed exploration level

achieving a leading constant matching the Lai–Robbins lower bound, and (ii) a nonasymptotic bound for Lai's UCB index with

linked to

, both with explicit control of second-order terms via Brownian-boundary analyses. The authors develop a novel analytical approach that leverages boundary-crossing probabilities of random walks, cast as repeated significance tests, and nonlinear renewal theory to bound the number of suboptimal pulls. The results connect nonasymptotic guarantees with classical information-theoretic limits, demonstrating that carefully tuned exploration can attain optimal constants even for finite horizons. The work also discusses extensions to sub-Gaussian rewards and potential applications to broader sequential decision-making settings, including contextual bandits and reinforcement learning.

Abstract

Paper Structure (17 sections, 8 theorems, 60 equations)

This paper contains 17 sections, 8 theorems, 60 equations.

Introduction
The Lai--Robbbins lower bound
Lai's UCB
Recent developments
Our contributions
Organization
Main results
Problem setting
Regret bounds for UCB with a constant level of exploration
Regret bound for Lai's UCB
Proofs of regret bounds
Proof of Theorem \ref{['th-Gaussian-LaiRobbins-T']}
Proof of Corollary \ref{['cor-Gaussian-LaiRobbins-T-1']}
Proof of Corollary \ref{['cor-Gaussian-LaiRobbins-T-2']}
Proof of Theorem \ref{['th-Gaussian-Lai-T']}
...and 2 more sections

Key Result

Theorem 1

Suppose the rewards from arm $a$ follow a Gaussian distribution with mean $\mu_a$ and no greater variance than $\sigma^2$ for all $a=1,\ldots,K$. Then, the regret of the UCB rule UCB-rule-T is bounded by where $R_{T'}$ is defined in regret-def and $\eta(b_{T'})$ is defined in th-Gaussian-LaiRobbins-T-1.

Theorems & Definitions (15)

Theorem 1
Remark 2
Corollary 3
Remark 4
Corollary 5
Theorem 6
Remark 7
Lemma 8
proof
Lemma 9
...and 5 more

On Lai's Upper Confidence Bound in Multi-Armed Bandits

TL;DR

Abstract

On Lai's Upper Confidence Bound in Multi-Armed Bandits

Authors

TL;DR

Abstract

Table of Contents

Key Result

Theorems & Definitions (15)