Near Optimal Pure Exploration in Logistic Bandits
Eduardo Ochoa Rivera, Ambuj Tewari
TL;DR
This work tackles pure exploration in logistic (GLM) bandits by introducing Log-TS, the first track-and-stop algorithm for general pure exploration in the logistic setting. It combines a concentration-based stopping rule with a forced-exploration-driven tracking sampler that follows an instance-specific lower-bound proportion of arm pulls, and it uses an MLE-based projection to control estimation errors via self-concordance. The authors derive an instance-specific lower bound for general pure exploration problems under logistic bandits and provide a tractable quadratic KL-based approximation, showing that Log-TS matches this bound asymptotically up to a logarithmic factor. Empirical results on best-arm identification and thresholding bandits demonstrate strong performance, especially as the number of arms grows, while avoiding warm-up phases thanks to forced exploration. The work advances near-optimal pure exploration in GLM bandits and lays groundwork for tighter tail bounds and online MLE updates in future research.
Abstract
Bandit algorithms have garnered significant attention due to their practical applications in real-world scenarios. However, beyond simple settings such as multi-arm or linear bandits, optimal algorithms remain scarce. Notably, no optimal solution exists for pure exploration problems in the context of generalized linear model (GLM) bandits. In this paper, we narrow this gap and develop the first track-and-stop algorithm for general pure exploration problems under the logistic bandit called logistic track-and-stop (Log-TS). Log-TS is an efficient algorithm that asymptotically matches an approximation for the instance-specific lower bound of the expected sample complexity up to a logarithmic factor.
