A General Recipe for the Analysis of Randomized Multi-Armed Bandit Algorithms

Dorian Baudry; Kazuya Suzuki; Junya Honda

A General Recipe for the Analysis of Randomized Multi-Armed Bandit Algorithms

Dorian Baudry, Kazuya Suzuki, Junya Honda

TL;DR

This paper revisits two famous bandit algorithms, Minimum Empirical Divergence (MED) and Thompson Sampling (TS), under various models for the distributions including single parameter exponential families, Gaussian distributions, bounded distributions, or distributions satisfying some conditions on their moments.

Abstract

In this paper we propose a general methodology to derive regret bounds for randomized multi-armed bandit algorithms. It consists in checking a set of sufficient conditions on the sampling probability of each arm and on the family of distributions to prove a logarithmic regret. As a direct application we revisit two famous bandit algorithms, Minimum Empirical Divergence (MED) and Thompson Sampling (TS), under various models for the distributions including single parameter exponential families, Gaussian distributions, bounded distributions, or distributions satisfying some conditions on their moments. In particular, we prove that MED is asymptotically optimal for all these models, but also provide a simple regret analysis of some TS algorithms for which the optimality is already known. We then further illustrate the interest of our approach, by analyzing a new Non-Parametric TS algorithm (h-NPTS), adapted to some families of unbounded reward distributions with a bounded h-moment. This model can for instance capture some non-parametric families of distributions whose variance is upper bounded by a known constant.

A General Recipe for the Analysis of Randomized Multi-Armed Bandit Algorithms

TL;DR

Abstract

Paper Structure (119 sections, 30 theorems, 237 equations, 6 figures, 1 table, 7 algorithms)

This paper contains 119 sections, 30 theorems, 237 equations, 6 figures, 1 table, 7 algorithms.

Introduction
Families of distributions
Asymptotically optimal bandit algorithms
Other standard performance metrics
Outline and contributions
Preliminaries
Notation and terminology
Presentation of the randomized policies under study
Minimum Empirical Divergence
Thompson Sampling
Analysis of the sampling probabilities under TS$^\star$
Potential computational gain with TS$^\star$
Theoretical results: generic analysis of randomized policies
Summary
Outline
...and 104 more sections

Key Result

Lemma 1

Under TS$^\star$, for any $t\in [T]$ and $k\in [K]$ it holds that and that In addition, $\mathbb{P}(k \in \mathcal{A}_t|\mathcal{H}_{t-1})=\mathbb{P}(\widetilde{\mu}_k(t)\geq \mu^\star(t)|\mathcal{H}_{t-1}) \coloneqq$[BCP] if $\mu_k(t) < \mu^\star(t)$ , and is equal to $1$ otherwise.

Figures (6)

Figure 1: Average regret and $10$--$90\%$ percentiles as a function of $T$ on $500$ runs, for experiments Bernoulli 1 (top left), Bernoulli 2 (top right), Bernoulli 3 (bottom left) and Bernoulli 4 (bottom right).
Figure 2: Average regret and $10$--$90\%$ percentiles as a function of $T$ on $500$ runs, for experiments Gauss 1 (top left), Gauss 2 (top right), Gauss 3 (bottom left) and Gauss 4 (bottom right).
Figure 3: Example of Beta distribution with Gaussian shape (Left, $a=b=10$), and Exponential shape (Right, $a=1,b=4$).
Figure 4: Average regret and $10$--$90\%$ percentiles as a function of $T$ on $500$ runs, for experiments Beta 1 (top left), Beta 2 (top right), Beta 3 (bottom left) and Beta 4 (bottom right).
Figure 5: Average regret and $10$--$90\%$ percentiles as a function of $T$ on $500$ runs, for experiments Gauss 1 with $h$-NPTS (top left), GM 1 (top right), GM 2 (bottom left) and GM 3 (bottom right).
...and 1 more figures

Theorems & Definitions (37)

Lemma 1: Bounding sampling probabilities with the BCP under TS$^\star$
Lemma 2
Corollary 1: of Lemma \ref{['lem::bounds_samp_bcp_TS']}, Assumption \ref{['ass::sampling_prob']} in the TS$^\star$ framework
Remark 1: Mean estimates
Theorem 1
Lemma 3
Theorem 2: Theorem 36.2 in BanditBook
Lemma 4: Problem-independent bound for sub-Gaussian MED
Proposition 1: Problem-independent bounds for MED (general case)
Definition 1
...and 27 more

A General Recipe for the Analysis of Randomized Multi-Armed Bandit Algorithms

TL;DR

Abstract

A General Recipe for the Analysis of Randomized Multi-Armed Bandit Algorithms

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (6)

Theorems & Definitions (37)