Table of Contents
Fetching ...

A Unified Confidence Sequence for Generalized Linear Models, with Applications to Bandits

Junghyun Lee, Se-Young Yun, Kwang-Sung Jun

TL;DR

This work develops a unified likelihood-ratio confidence sequence (CS) for convex generalized linear models (GLMs), deriving a time-uniform, convex, and numerically tight CS that applies to Gaussian, Bernoulli, Poisson, and other GLMs. It introduces an ellipsoidal CS for self-concordant GLMs and a PAC-Bayes proof using a uniform prior/posterior, yielding practical, implementable constants. Building on this CS, the paper proposes OFUGLB, a generic optimistic UCB-style algorithm for generalized linear bandits (GLBs) that achieves state-of-the-art regret bounds and is poly(S)-free in leading terms for bounded GLBs, including logistic bandits. A novel regret analysis bypasses the traditional self-concordance control lemma, providing a fresh analytical tool for GLB analyses. Empirically, OFUGLB demonstrates superior performance in logistic bandits, validating the theoretical improvements and highlighting practical applicability to sequential decision-making under GLMs.

Abstract

We present a unified likelihood ratio-based confidence sequence (CS) for any (self-concordant) generalized linear model (GLM) that is guaranteed to be convex and numerically tight. We show that this is on par or improves upon known CSs for various GLMs, including Gaussian, Bernoulli, and Poisson. In particular, for the first time, our CS for Bernoulli has a $\mathrm{poly}(S)$-free radius where $S$ is the norm of the unknown parameter. Our first technical novelty is its derivation, which utilizes a time-uniform PAC-Bayesian bound with a uniform prior/posterior, despite the latter being a rather unpopular choice for deriving CSs. As a direct application of our new CS, we propose a simple and natural optimistic algorithm called OFUGLB, applicable to any generalized linear bandits (GLB; Filippi et al. (2010)). Our analysis shows that the celebrated optimistic approach simultaneously attains state-of-the-art regrets for various self-concordant (not necessarily bounded) GLBs, and even $\mathrm{poly}(S)$-free for bounded GLBs, including logistic bandits. The regret analysis, our second technical novelty, follows from combining our new CS with a new proof technique that completely avoids the previously widely used self-concordant control lemma (Faury et al., 2020, Lemma 9). Numerically, OFUGLB outperforms or is at par with prior algorithms for logistic bandits.

A Unified Confidence Sequence for Generalized Linear Models, with Applications to Bandits

TL;DR

This work develops a unified likelihood-ratio confidence sequence (CS) for convex generalized linear models (GLMs), deriving a time-uniform, convex, and numerically tight CS that applies to Gaussian, Bernoulli, Poisson, and other GLMs. It introduces an ellipsoidal CS for self-concordant GLMs and a PAC-Bayes proof using a uniform prior/posterior, yielding practical, implementable constants. Building on this CS, the paper proposes OFUGLB, a generic optimistic UCB-style algorithm for generalized linear bandits (GLBs) that achieves state-of-the-art regret bounds and is poly(S)-free in leading terms for bounded GLBs, including logistic bandits. A novel regret analysis bypasses the traditional self-concordance control lemma, providing a fresh analytical tool for GLB analyses. Empirically, OFUGLB demonstrates superior performance in logistic bandits, validating the theoretical improvements and highlighting practical applicability to sequential decision-making under GLMs.

Abstract

We present a unified likelihood ratio-based confidence sequence (CS) for any (self-concordant) generalized linear model (GLM) that is guaranteed to be convex and numerically tight. We show that this is on par or improves upon known CSs for various GLMs, including Gaussian, Bernoulli, and Poisson. In particular, for the first time, our CS for Bernoulli has a -free radius where is the norm of the unknown parameter. Our first technical novelty is its derivation, which utilizes a time-uniform PAC-Bayesian bound with a uniform prior/posterior, despite the latter being a rather unpopular choice for deriving CSs. As a direct application of our new CS, we propose a simple and natural optimistic algorithm called OFUGLB, applicable to any generalized linear bandits (GLB; Filippi et al. (2010)). Our analysis shows that the celebrated optimistic approach simultaneously attains state-of-the-art regrets for various self-concordant (not necessarily bounded) GLBs, and even -free for bounded GLBs, including logistic bandits. The regret analysis, our second technical novelty, follows from combining our new CS with a new proof technique that completely avoids the previously widely used self-concordant control lemma (Faury et al., 2020, Lemma 9). Numerically, OFUGLB outperforms or is at par with prior algorithms for logistic bandits.
Paper Structure (56 sections, 20 theorems, 75 equations, 4 figures, 2 tables)

This paper contains 56 sections, 20 theorems, 75 equations, 4 figures, 2 tables.

Key Result

Theorem 3.1

Let $L_t$ be the Lipschitz constantIf ${\mathcal{L}}_t$ is differentiable, one could apply the Rademacher's theoremgeommeasure: $L_t := \inf\left\{ L \geq 0 : |{\mathcal{L}}_t(\bm\theta) - {\mathcal{L}}_t(\bm\theta')| \leq L \left\lVert \bm\theta - \bm\theta' \right\rVert_2, \ \forall \bm\theta, \b where the last inequality follows from the choice $c_t = 1 \wedge \frac{d}{2S L_t}.$

Figures (4)

  • Figure 1: Time-varying arm-sets. (First row) Regret plots of all considered algorithms. (Second row) Magnified regret plots. (Third row) Confidence set plots at the final time $t = 10000$ when applicable. Each column represents a different logistic bandit instance for $S \in \{4, 6, 8, 10\}$.
  • Figure 2: Fixed arm-set. (First row) Regret plots of all considered algorithms. (Second row) Magnified regret plots. (Third row) Confidence set plots at the final time $t = 10000$ when applicable. Each column represents a different logistic bandit instance for $S \in \{4, 6, 8, 10\}$.
  • Figure 3: Time-varying arm-sets with $|{\mathcal{A}}_t| = 10$.
  • Figure 4: Fixed arm-set with $|{\mathcal{A}}| = 10$.

Theorems & Definitions (38)

  • Theorem 3.1: Unified CS for GLMs
  • Remark 1: Generality of our Unfied CS
  • Theorem 3.2: Ellipsoidal CS for Self-Concordant GLMs
  • Lemma 3.1
  • proof
  • Lemma 3.2: Theorem 2.1 of donsker1983kl
  • Lemma 3.3
  • proof
  • Remark 2: Choice of KL
  • Remark 3: Use of Regularized MLE?
  • ...and 28 more