Differential Good Arm Identification
Yun-Da Tsai, Tzu-Hsien Tsai, Shou-De Lin
TL;DR
This work addresses good arm identification (GAI) in stochastic bandits, aiming to identify many arms with mean rewards above a threshold $\xi$ using as few samples as possible. It introduces DGAI, a differentiable algorithm that learns adaptive confidence bounds via a differentiable UCB index, with separate training objectives for sampling and identification, and proves a $\delta$-PAC guarantee for the linear case. DGAI outperforms state-of-the-art baselines (e.g., HDoC, LUCB-G, APT-G) on synthetic and real-world datasets for GAI and can enhance cumulative reward maximization in MAB problems when a threshold is provided as prior knowledge. The approach delivers data-driven, problem-adaptive confidence bounds, leading to substantial improvements in sample efficiency and decision quality with potential extensions to non-linear settings.
Abstract
This paper targets a variant of the stochastic multi-armed bandit problem called good arm identification (GAI). GAI is a pure-exploration bandit problem with the goal to output as many good arms using as few samples as possible, where a good arm is defined as an arm whose expected reward is greater than a given threshold. In this work, we propose DGAI - a differentiable good arm identification algorithm to improve the sample complexity of the state-of-the-art HDoC algorithm in a data-driven fashion. We also showed that the DGAI can further boost the performance of a general multi-arm bandit (MAB) problem given a threshold as a prior knowledge to the arm set. Extensive experiments confirm that our algorithm outperform the baseline algorithms significantly in both synthetic and real world datasets for both GAI and MAB tasks.
