Table of Contents
Fetching ...

Near Optimal Pure Exploration in Logistic Bandits

Eduardo Ochoa Rivera, Ambuj Tewari

TL;DR

This work tackles pure exploration in logistic (GLM) bandits by introducing Log-TS, the first track-and-stop algorithm for general pure exploration in the logistic setting. It combines a concentration-based stopping rule with a forced-exploration-driven tracking sampler that follows an instance-specific lower-bound proportion of arm pulls, and it uses an MLE-based projection to control estimation errors via self-concordance. The authors derive an instance-specific lower bound for general pure exploration problems under logistic bandits and provide a tractable quadratic KL-based approximation, showing that Log-TS matches this bound asymptotically up to a logarithmic factor. Empirical results on best-arm identification and thresholding bandits demonstrate strong performance, especially as the number of arms grows, while avoiding warm-up phases thanks to forced exploration. The work advances near-optimal pure exploration in GLM bandits and lays groundwork for tighter tail bounds and online MLE updates in future research.

Abstract

Bandit algorithms have garnered significant attention due to their practical applications in real-world scenarios. However, beyond simple settings such as multi-arm or linear bandits, optimal algorithms remain scarce. Notably, no optimal solution exists for pure exploration problems in the context of generalized linear model (GLM) bandits. In this paper, we narrow this gap and develop the first track-and-stop algorithm for general pure exploration problems under the logistic bandit called logistic track-and-stop (Log-TS). Log-TS is an efficient algorithm that asymptotically matches an approximation for the instance-specific lower bound of the expected sample complexity up to a logarithmic factor.

Near Optimal Pure Exploration in Logistic Bandits

TL;DR

This work tackles pure exploration in logistic (GLM) bandits by introducing Log-TS, the first track-and-stop algorithm for general pure exploration in the logistic setting. It combines a concentration-based stopping rule with a forced-exploration-driven tracking sampler that follows an instance-specific lower-bound proportion of arm pulls, and it uses an MLE-based projection to control estimation errors via self-concordance. The authors derive an instance-specific lower bound for general pure exploration problems under logistic bandits and provide a tractable quadratic KL-based approximation, showing that Log-TS matches this bound asymptotically up to a logarithmic factor. Empirical results on best-arm identification and thresholding bandits demonstrate strong performance, especially as the number of arms grows, while avoiding warm-up phases thanks to forced exploration. The work advances near-optimal pure exploration in GLM bandits and lays groundwork for tighter tail bounds and online MLE updates in future research.

Abstract

Bandit algorithms have garnered significant attention due to their practical applications in real-world scenarios. However, beyond simple settings such as multi-arm or linear bandits, optimal algorithms remain scarce. Notably, no optimal solution exists for pure exploration problems in the context of generalized linear model (GLM) bandits. In this paper, we narrow this gap and develop the first track-and-stop algorithm for general pure exploration problems under the logistic bandit called logistic track-and-stop (Log-TS). Log-TS is an efficient algorithm that asymptotically matches an approximation for the instance-specific lower bound of the expected sample complexity up to a logarithmic factor.

Paper Structure

This paper contains 33 sections, 21 theorems, 134 equations, 2 figures, 2 tables, 1 algorithm.

Key Result

Lemma 3 .1

Let $\delta \in(0,1]$ and $\lambda(t) > 0$ for $t \ge 1$. If exist $t_0 \ge 1$ such that for $t \ge t_0$, $\lambda_{\min }(\mathbf{H}_t\left(\theta_*\right)) > \lambda(t)$, with probability at least $1-\delta$: where $\gamma_t(\delta):=\frac{\sqrt{\lambda(t)}}{2}+\frac{4}{\sqrt{\lambda(t)}} \log \left(\frac{2^d}{\delta}\left(\frac{L t}{\lambda (t)d}\right)^{\frac{d}{2}}\right)$

Figures (2)

  • Figure 1: Logarithm of sample complexity of the benchmark setup for BAI against dimension of the action space $\mathcal{X}$.
  • Figure 2: Logarithm of sample complexity of the benchmark setup for TBP against dimension of the action space $\mathcal{X}$.

Theorems & Definitions (36)

  • Lemma 3 .1
  • Theorem 4 .1
  • proof
  • Lemma 5 .1
  • Lemma 5 .2
  • Lemma 5 .3
  • Lemma 5 .4
  • Proposition 1
  • Theorem 6 .1
  • Theorem 6 .2
  • ...and 26 more