Table of Contents
Fetching ...

Self-Concordant Perturbations for Linear Bandits

Lucas Lévy, Jean-Lou Valeau, Arya Akhavan, Patrick Rebeschini

TL;DR

This work tackles adversarial linear bandits by unifying Follow-the-Regularized-Leader and Follow-the-Perturbed-Leader via a Bandits-GBPA framework, augmented with self-concordant perturbations that replicate barrier-induced smoothness in FTPL. The authors introduce SC-FTPL, a perturbation-based algorithm that concentrates exploration at extreme points and leverages self-concordant barriers to control regret. For the hypercube and Euclidean ball action sets, they construct explicit perturbations and prove regret bounds of $O(d\sqrt{n\ln n})$, with a $\sqrt{d}$ improvement over SCRiBLe on the hypercube and matching rates on the ball up to logarithmic factors, while achieving per-round complexity linear in $d$. They also highlight the heavy-tailed nature of self-concordant perturbations and provide a detailed complexity analysis. Overall, the work advances tractable, principled strategies for adversarial linear bandits across canonical geometries and opens avenues for extending self-concordant perturbations to broader convex sets and learning settings.

Abstract

We study the adversarial linear bandits problem and present a unified algorithmic framework that bridges Follow-the-Regularized-Leader (FTRL) and Follow-the-Perturbed-Leader (FTPL) methods, extending the known connection between them from the full-information setting. Within this framework, we introduce self-concordant perturbations, a family of probability distributions that mirror the role of self-concordant barriers previously employed in the FTRL-based SCRiBLe algorithm. Using this idea, we design a novel FTPL-based algorithm that combines self-concordant regularization with efficient stochastic exploration. Our approach achieves a regret of $O(d\sqrt{n \ln n})$ on both the $d$-dimensional hypercube and the Euclidean ball. On the Euclidean ball, this matches the rate attained by existing self-concordant FTRL methods. For the hypercube, this represents a $\sqrt{d}$ improvement over these methods and matches the optimal bound up to logarithmic factors.

Self-Concordant Perturbations for Linear Bandits

TL;DR

This work tackles adversarial linear bandits by unifying Follow-the-Regularized-Leader and Follow-the-Perturbed-Leader via a Bandits-GBPA framework, augmented with self-concordant perturbations that replicate barrier-induced smoothness in FTPL. The authors introduce SC-FTPL, a perturbation-based algorithm that concentrates exploration at extreme points and leverages self-concordant barriers to control regret. For the hypercube and Euclidean ball action sets, they construct explicit perturbations and prove regret bounds of , with a improvement over SCRiBLe on the hypercube and matching rates on the ball up to logarithmic factors, while achieving per-round complexity linear in . They also highlight the heavy-tailed nature of self-concordant perturbations and provide a detailed complexity analysis. Overall, the work advances tractable, principled strategies for adversarial linear bandits across canonical geometries and opens avenues for extending self-concordant perturbations to broader convex sets and learning settings.

Abstract

We study the adversarial linear bandits problem and present a unified algorithmic framework that bridges Follow-the-Regularized-Leader (FTRL) and Follow-the-Perturbed-Leader (FTPL) methods, extending the known connection between them from the full-information setting. Within this framework, we introduce self-concordant perturbations, a family of probability distributions that mirror the role of self-concordant barriers previously employed in the FTRL-based SCRiBLe algorithm. Using this idea, we design a novel FTPL-based algorithm that combines self-concordant regularization with efficient stochastic exploration. Our approach achieves a regret of on both the -dimensional hypercube and the Euclidean ball. On the Euclidean ball, this matches the rate attained by existing self-concordant FTRL methods. For the hypercube, this represents a improvement over these methods and matches the optimal bound up to logarithmic factors.

Paper Structure

This paper contains 37 sections, 15 theorems, 164 equations, 2 algorithms.

Key Result

Lemma 1

Let $K$ be a convex body, $\mathcal{R}$ be a self-concordant barrier on $K$ and $x\in\operatorname{int} K$. Then, the Hessian of $\mathcal{R}$ in $x$ is positive definite, so we can define the local norm in $x$ as $\|y\|_{x}\coloneq \|y\|_{\nabla^2\mathcal{R}(x)}$ for all $y\in\mathbb{R}^d$. We also Then $W(x)\subset K$ and for all $y\in W(x)$, where $\rho(t)\coloneq-\ln(1-t)-t$.

Theorems & Definitions (30)

  • Definition 1: Self-Concordant Barrier
  • Lemma 1: Local Norm and Dikin Ellipsoid
  • Lemma 2
  • Proposition 3: chewi
  • Definition 2: Self-Concordant Perturbation
  • Theorem 4: name=Regret of SC-FTPL on the Hypercube,restate=regrethypercube
  • Theorem 5: name=Regret of SC-FTPL on $\mathbb{B}^d$,restate=regretsphere
  • Definition 3: Unbiased Sampling and Estimation Schemes
  • Lemma 6: name=,restate=regretgenericbound
  • Theorem 7: name=,restate=SCregretbound
  • ...and 20 more