On The Complexity of Best-Arm Identification in Non-Stationary Linear Bandits

Leo Maynard-Zhang; Zhihan Xiong; Kevin Jamieson; Maryam Fazel

On The Complexity of Best-Arm Identification in Non-Stationary Linear Bandits

Leo Maynard-Zhang, Zhihan Xiong, Kevin Jamieson, Maryam Fazel

TL;DR

The Adjacent-optimal design is proposed, a specialization of the well-known $\mathcal{X}\mathcal{Y}$-optimal design is proposed, and the error probability of the Adjacent-BAI algorithm is proved, proving the tightness of the lower bound and establishing the arm-set-dependent complexity of this setting.

Abstract

We study the fixed-budget best-arm identification (BAI) problem in non-stationary linear bandits. Concretely, given a fixed time budget $T\in \mathbb{N}$, finite arm set $\mathcal{X} \subset \mathbb{R}^d$, and a potentially adversarial sequence of unknown parameters $\lbrace θ_t\rbrace_{t=1}^{T}$ (hence non-stationary), a learner aims to identify the arm with the largest cumulative reward $x_* = \arg\max_{x \in \mathcal{X}} x^\top\sum_{t=1}^T θ_t$ with high probability. In this setting, it is well-known that uniformly sampling arms from the G-optimal design yields a minimax-optimal error probability of $\exp\left(-Θ\left(T / H_{G}\right)\right)$, where $H_{G}$ scales proportionally with the dimension $d$. However, this notion of complexity is overly pessimistic, as it is derived from a lower bound in which the arm set consists only of the standard basis vectors, thus masking any potential advantages arising from arm sets with richer geometric structure. To address this, we establish an arm-set-dependent lower bound that, in contrast, holds for any arm set. Motivated by the ideas underlying our lower bound, we propose the Adjacent-optimal design, a specialization of the well-known $\mathcal{X}\mathcal{Y}$-optimal design, and develop the $\textsf{Adjacent-BAI}$ algorithm. We prove that the error probability of $\textsf{Adjacent-BAI}$ matches our lower bound up to constants, verifying the tightness of our lower bound, and establishing the arm-set-dependent complexity of this setting.

On The Complexity of Best-Arm Identification in Non-Stationary Linear Bandits

TL;DR

The Adjacent-optimal design is proposed, a specialization of the well-known

-optimal design is proposed, and the error probability of the Adjacent-BAI algorithm is proved, proving the tightness of the lower bound and establishing the arm-set-dependent complexity of this setting.

Abstract

We study the fixed-budget best-arm identification (BAI) problem in non-stationary linear bandits. Concretely, given a fixed time budget

, finite arm set

, and a potentially adversarial sequence of unknown parameters

(hence non-stationary), a learner aims to identify the arm with the largest cumulative reward

with high probability. In this setting, it is well-known that uniformly sampling arms from the G-optimal design yields a minimax-optimal error probability of

, where

scales proportionally with the dimension

. However, this notion of complexity is overly pessimistic, as it is derived from a lower bound in which the arm set consists only of the standard basis vectors, thus masking any potential advantages arising from arm sets with richer geometric structure. To address this, we establish an arm-set-dependent lower bound that, in contrast, holds for any arm set. Motivated by the ideas underlying our lower bound, we propose the Adjacent-optimal design, a specialization of the well-known

-optimal design, and develop the

algorithm. We prove that the error probability of

matches our lower bound up to constants, verifying the tightness of our lower bound, and establishing the arm-set-dependent complexity of this setting.

Paper Structure (35 sections, 13 theorems, 125 equations, 1 algorithm)

This paper contains 35 sections, 13 theorems, 125 equations, 1 algorithm.

Introduction
Our contributions.
Related Work
Best-arm identification with fixed confidence.
Best-arm identification with fixed budget.
Best-arm identification in non-stationary environments.
Preliminaries
Notation.
BAI in Non-Stationary Linear Bandits
Minimax-Optimal Complexity in Non-Stationary BAI
Adjacency and Non-Stationary BAI
Lower Bound of BAI in Non-Stationary Linear Bandits
Proof Sketch of Theorem \ref{['thm:finallower']}
First step.
Second step.
...and 20 more sections

Key Result

Lemma 1

Let $\mathcal{X} \subset \mathbb{R}^d$, and $\theta\in\mathbb{R}^d$. For any $x \in \mathcal{V}$, there exists $y \in \mathcal{X}$ such that $(y-x)^\top \theta > 0$ if and only if there exists $z \in \mathcal{I}^x$ such that $(z-x)^\top \theta > 0$.

Theorems & Definitions (28)

Definition 1
Lemma 1: Adjacency Lemma
Theorem 1
Lemma 2
Definition 2
Lemma 3: Optimization-based Lower Bound
Lemma 4
Theorem 2: Error probability of Adjacent-BAI
Lemma 5: Sub-Gaussian error of least-squares estimator
Proposition 1
...and 18 more

On The Complexity of Best-Arm Identification in Non-Stationary Linear Bandits

TL;DR

Abstract

On The Complexity of Best-Arm Identification in Non-Stationary Linear Bandits

Authors

TL;DR

Abstract

Table of Contents

Key Result

Theorems & Definitions (28)