Table of Contents
Fetching ...

Generalized Linear Bandits: Almost Optimal Regret with One-Pass Update

Yu-Jie Zhang, Sheng-An Xu, Peng Zhao, Masashi Sugiyama

TL;DR

This work addresses generalized linear bandits by introducing GLB-OMD, a jointly efficient algorithm that achieves nearly optimal regret with constant-time, constant-space updates per round. The key insight is constructing tight confidence sets for an online mirror descent estimator using a novel mix-loss analysis, which enables an optimistic (OFU) strategy without storing all past data or solving expensive MLEs. Theoretical results show a leading regret of order ~$ ilde{O}(d \, ext{sqrt}(T / \, ext{kappa}_*))$, with per-round computation and memory independent of T, and experiments demonstrate dramatic computational savings (e.g., up to ~1000x speed-ups) while maintaining competitive regret. The approach extends to unbounded GLMs (e.g., Poisson) and offers practical implications for scalable contextual decision-making under non-linear rewards.

Abstract

We study the generalized linear bandit (GLB) problem, a contextual multi-armed bandit framework that extends the classical linear model by incorporating a non-linear link function, thereby modeling a broad class of reward distributions such as Bernoulli and Poisson. While GLBs are widely applicable to real-world scenarios, their non-linear nature introduces significant challenges in achieving both computational and statistical efficiency. Existing methods typically trade off between two objectives, either incurring high per-round costs for optimal regret guarantees or compromising statistical efficiency to enable constant-time updates. In this paper, we propose a jointly efficient algorithm that attains a nearly optimal regret bound with $\mathcal{O}(1)$ time and space complexities per round. The core of our method is a tight confidence set for the online mirror descent (OMD) estimator, which is derived through a novel analysis that leverages the notion of mix loss from online prediction. The analysis shows that our OMD estimator, even with its one-pass updates, achieves statistical efficiency comparable to maximum likelihood estimation, thereby leading to a jointly efficient optimistic method.

Generalized Linear Bandits: Almost Optimal Regret with One-Pass Update

TL;DR

This work addresses generalized linear bandits by introducing GLB-OMD, a jointly efficient algorithm that achieves nearly optimal regret with constant-time, constant-space updates per round. The key insight is constructing tight confidence sets for an online mirror descent estimator using a novel mix-loss analysis, which enables an optimistic (OFU) strategy without storing all past data or solving expensive MLEs. Theoretical results show a leading regret of order ~, with per-round computation and memory independent of T, and experiments demonstrate dramatic computational savings (e.g., up to ~1000x speed-ups) while maintaining competitive regret. The approach extends to unbounded GLMs (e.g., Poisson) and offers practical implications for scalable contextual decision-making under non-linear rewards.

Abstract

We study the generalized linear bandit (GLB) problem, a contextual multi-armed bandit framework that extends the classical linear model by incorporating a non-linear link function, thereby modeling a broad class of reward distributions such as Bernoulli and Poisson. While GLBs are widely applicable to real-world scenarios, their non-linear nature introduces significant challenges in achieving both computational and statistical efficiency. Existing methods typically trade off between two objectives, either incurring high per-round costs for optimal regret guarantees or compromising statistical efficiency to enable constant-time updates. In this paper, we propose a jointly efficient algorithm that attains a nearly optimal regret bound with time and space complexities per round. The core of our method is a tight confidence set for the online mirror descent (OMD) estimator, which is derived through a novel analysis that leverages the notion of mix loss from online prediction. The analysis shows that our OMD estimator, even with its one-pass updates, achieves statistical efficiency comparable to maximum likelihood estimation, thereby leading to a jointly efficient optimistic method.

Paper Structure

This paper contains 30 sections, 16 theorems, 93 equations, 5 figures, 2 tables, 1 algorithm.

Key Result

Theorem 1

Let $\delta \in (0,1]$. Set the step size to $\eta = 1 + RS$ and the regularization parameter to $\lambda = \max\{ 14d\eta R^2, 6\eta RS L_{\mu}/g(\tau) \}$. For each $t\in[T]$, define the confidence set as where $\theta_t$ is the online estimator eq:online-estimator and the radius $\beta_t(\delta)$ is given by Then, under Assumptions assum:bounded-domain, assum:kappa, and assum:self-concordant

Figures (5)

  • Figure 1: Regret and running time comparison of different algorithms on logistic bandits.
  • Figure 2: Regret and running time comparison of different algorithms on Poisson bandits.
  • Figure 3: Confidence Region of Parameter Estimation.
  • Figure 4: Regret and Running Time Dependence on $S$.
  • Figure 5: Performance comparison of different algorithms on Cover Type Data

Theorems & Definitions (28)

  • Remark 1: Unbounded GLMs
  • Theorem 1
  • Remark 2: Comparison with Logistic Bandits Literature
  • Theorem 2
  • Lemma 1
  • Lemma 2: informal
  • Lemma 3: informal
  • proof : Proof of Theorem \ref{['thm:confidence-set']}
  • Lemma 4
  • proof : Proof of Lemma \ref{['lem:regret-to-set']}
  • ...and 18 more