Table of Contents
Fetching ...

Optimal Thresholding Linear Bandit

Eduardo Ochoa Rivera, Ambuj Tewari

TL;DR

This work studies the $\epsilon$-Thresholding Bandit Problem (TBP) in stochastic linear bandits, with the goal of identifying all arms whose mean rewards exceed a threshold $\rho$ under fixed confidence. It derives an instance-specific finite-sample lower bound on the expected number of samples and develops a Track-and-Stop–style algorithm that is asymptotically optimal for TBP in the linear setting, using a least-squares estimator with enforced exploration, a sampling rule that targets the optimal design over arms, and a stopping rule based on a generalized likelihood ratio. The analysis shows that the set of optimal sampling proportions is convex, enabling convergence of the algorithm to the optimal region, and provides both high-probability and in-expectation guarantees on the stopping time that match the lower bound up to constants. Empirically, the method demonstrates competitive performance against baseline pure-exploration algorithms on TBP-specific benchmarks and supports potential extensions to relaxed threshold settings and generalized linear models for broader scientific discovery tasks.

Abstract

We study a novel pure exploration problem: the $ε$-Thresholding Bandit Problem (TBP) with fixed confidence in stochastic linear bandits. We prove a lower bound for the sample complexity and extend an algorithm designed for Best Arm Identification in the linear case to TBP that is asymptotically optimal.

Optimal Thresholding Linear Bandit

TL;DR

This work studies the -Thresholding Bandit Problem (TBP) in stochastic linear bandits, with the goal of identifying all arms whose mean rewards exceed a threshold under fixed confidence. It derives an instance-specific finite-sample lower bound on the expected number of samples and develops a Track-and-Stop–style algorithm that is asymptotically optimal for TBP in the linear setting, using a least-squares estimator with enforced exploration, a sampling rule that targets the optimal design over arms, and a stopping rule based on a generalized likelihood ratio. The analysis shows that the set of optimal sampling proportions is convex, enabling convergence of the algorithm to the optimal region, and provides both high-probability and in-expectation guarantees on the stopping time that match the lower bound up to constants. Empirically, the method demonstrates competitive performance against baseline pure-exploration algorithms on TBP-specific benchmarks and supports potential extensions to relaxed threshold settings and generalized linear models for broader scientific discovery tasks.

Abstract

We study a novel pure exploration problem: the -Thresholding Bandit Problem (TBP) with fixed confidence in stochastic linear bandits. We prove a lower bound for the sample complexity and extend an algorithm designed for Best Arm Identification in the linear case to TBP that is asymptotically optimal.
Paper Structure (15 sections, 13 theorems, 120 equations, 1 figure, 2 tables, 1 algorithm)

This paper contains 15 sections, 13 theorems, 120 equations, 1 figure, 2 tables, 1 algorithm.

Key Result

Proposition 3.1

Let $\mathbf{x}_n$ be an allocation such that $\mathcal{S}^*(\mathbf{x}_n) \subseteq \tilde{\mathcal{C}} \cap \underaccent{\tilde{}}{\mathcal{C}}$, then

Figures (1)

  • Figure 1: The halved planes represent the value of $\theta$ that makes the corresponding point to be above the threshold $\rho$. The red on corresponds to $x_1$, the blue one to $x_2$ and the yellow one to $x_3$.

Theorems & Definitions (24)

  • Definition 2.1
  • Definition 2.2
  • Proposition 3.1
  • proof
  • Theorem 3.2
  • proof
  • Corollary 3.3
  • Lemma 4.1
  • Lemma 4.2
  • Lemma 4.3
  • ...and 14 more