Table of Contents
Fetching ...

lil'HDoC: An Algorithm for Good Arm Identification under Small Threshold Gap

Tzu-Hsien Tsai, Yun-Da Tsai, Shou-De Lin

TL;DR

This work tackles Good Arm Identification under a small threshold gap in stochastic multi-armed bandits. It introduces lil'HDoC, which begins with a short, controlled multi-sampling phase for every arm and employs a Law of Iterated Logarithm–based confidence bound to accelerate correct identification. Theoretical results show that the first $\lambda$ good arms require essentially the same effort as in HDoC up to a negligible term in the small-gap setting, while the total sample complexity improves from a $\log(1/\Delta)$ term to a $\log\log(1/\Delta)$ term. Empirical results on synthetic and real-world datasets confirm that lil'HDoC outperforms HDoC and LUCB-G in challenging scenarios, indicating practical benefits for rapid, reliable good-arm identification.

Abstract

Good arm identification (GAI) is a pure-exploration bandit problem in which a single learner outputs an arm as soon as it is identified as a good arm. A good arm is defined as an arm with an expected reward greater than or equal to a given threshold. This paper focuses on the GAI problem under a small threshold gap, which refers to the distance between the expected rewards of arms and the given threshold. We propose a new algorithm called lil'HDoC to significantly improve the total sample complexity of the HDoC algorithm. We demonstrate that the sample complexity of the first $λ$ output arm in lil'HDoC is bounded by the original HDoC algorithm, except for one negligible term, when the distance between the expected reward and threshold is small. Extensive experiments confirm that our algorithm outperforms the state-of-the-art algorithms in both synthetic and real-world datasets.

lil'HDoC: An Algorithm for Good Arm Identification under Small Threshold Gap

TL;DR

This work tackles Good Arm Identification under a small threshold gap in stochastic multi-armed bandits. It introduces lil'HDoC, which begins with a short, controlled multi-sampling phase for every arm and employs a Law of Iterated Logarithm–based confidence bound to accelerate correct identification. Theoretical results show that the first good arms require essentially the same effort as in HDoC up to a negligible term in the small-gap setting, while the total sample complexity improves from a term to a term. Empirical results on synthetic and real-world datasets confirm that lil'HDoC outperforms HDoC and LUCB-G in challenging scenarios, indicating practical benefits for rapid, reliable good-arm identification.

Abstract

Good arm identification (GAI) is a pure-exploration bandit problem in which a single learner outputs an arm as soon as it is identified as a good arm. A good arm is defined as an arm with an expected reward greater than or equal to a given threshold. This paper focuses on the GAI problem under a small threshold gap, which refers to the distance between the expected rewards of arms and the given threshold. We propose a new algorithm called lil'HDoC to significantly improve the total sample complexity of the HDoC algorithm. We demonstrate that the sample complexity of the first output arm in lil'HDoC is bounded by the original HDoC algorithm, except for one negligible term, when the distance between the expected reward and threshold is small. Extensive experiments confirm that our algorithm outperforms the state-of-the-art algorithms in both synthetic and real-world datasets.
Paper Structure (14 sections, 5 theorems, 13 equations, 1 figure, 4 tables, 1 algorithm)

This paper contains 14 sections, 5 theorems, 13 equations, 1 figure, 4 tables, 1 algorithm.

Key Result

lemma thmcounterlemma

Let $X_1, X_2, \dots X_n$ be i.i.d. $\sigma$--sub--gaussian random variables. Then for algorithm parameters $\epsilon \in (0, 1)$ and $\rho \in (0, \dfrac{\log{(1 + \epsilon)}}{e})$, with probability at least $1 - c_{\epsilon} \rho^{1 + \epsilon}$, for all $t > 0$. Here, $U$ is the upper confidence bound

Figures (1)

  • Figure 1: Experimental Results

Theorems & Definitions (10)

  • lemma thmcounterlemma: Finite form of LIL
  • proof
  • lemma thmcounterlemma
  • proof
  • theorem thmcountertheorem
  • proof : Proof of Theorem \ref{['samplepaper:thm:correctness']}
  • theorem thmcountertheorem: First $\lambda$ Arms Sample Complexity
  • proof
  • theorem thmcountertheorem: Sample Complexity
  • proof