lil'HDoC: An Algorithm for Good Arm Identification under Small Threshold Gap
Tzu-Hsien Tsai, Yun-Da Tsai, Shou-De Lin
TL;DR
This work tackles Good Arm Identification under a small threshold gap in stochastic multi-armed bandits. It introduces lil'HDoC, which begins with a short, controlled multi-sampling phase for every arm and employs a Law of Iterated Logarithm–based confidence bound to accelerate correct identification. Theoretical results show that the first $\lambda$ good arms require essentially the same effort as in HDoC up to a negligible term in the small-gap setting, while the total sample complexity improves from a $\log(1/\Delta)$ term to a $\log\log(1/\Delta)$ term. Empirical results on synthetic and real-world datasets confirm that lil'HDoC outperforms HDoC and LUCB-G in challenging scenarios, indicating practical benefits for rapid, reliable good-arm identification.
Abstract
Good arm identification (GAI) is a pure-exploration bandit problem in which a single learner outputs an arm as soon as it is identified as a good arm. A good arm is defined as an arm with an expected reward greater than or equal to a given threshold. This paper focuses on the GAI problem under a small threshold gap, which refers to the distance between the expected rewards of arms and the given threshold. We propose a new algorithm called lil'HDoC to significantly improve the total sample complexity of the HDoC algorithm. We demonstrate that the sample complexity of the first $λ$ output arm in lil'HDoC is bounded by the original HDoC algorithm, except for one negligible term, when the distance between the expected reward and threshold is small. Extensive experiments confirm that our algorithm outperforms the state-of-the-art algorithms in both synthetic and real-world datasets.
