Table of Contents
Fetching ...

A Perturbation Approach to Unconstrained Linear Bandits

Andrew Jacobsen, Dorian Baudry, Shinji Ito, Nicolò Cesa-Bianchi

Abstract

We revisit the standard perturbation-based approach of Abernethy et al. (2008) in the context of unconstrained Bandit Linear Optimization (uBLO). We show the surprising result that in the unconstrained setting, this approach effectively reduces Bandit Linear Optimization (BLO) to a standard Online Linear Optimization (OLO) problem. Our framework improves on prior work in several ways. First, we derive expected-regret guarantees when our perturbation scheme is combined with comparator-adaptive OLO algorithms, leading to new insights about the impact of different adversarial models on the resulting comparator-adaptive rates. We also extend our analysis to dynamic regret, obtaining the optimal $\sqrt{P_T}$ path-length dependencies without prior knowledge of $P_T$. We then develop the first high-probability guarantees for both static and dynamic regret in uBLO. Finally, we discuss lower bounds on the static regret, and prove the folklore $Ω(\sqrt{dT})$ rate for adversarial linear bandits on the unit Euclidean ball, which is of independent interest.

A Perturbation Approach to Unconstrained Linear Bandits

Abstract

We revisit the standard perturbation-based approach of Abernethy et al. (2008) in the context of unconstrained Bandit Linear Optimization (uBLO). We show the surprising result that in the unconstrained setting, this approach effectively reduces Bandit Linear Optimization (BLO) to a standard Online Linear Optimization (OLO) problem. Our framework improves on prior work in several ways. First, we derive expected-regret guarantees when our perturbation scheme is combined with comparator-adaptive OLO algorithms, leading to new insights about the impact of different adversarial models on the resulting comparator-adaptive rates. We also extend our analysis to dynamic regret, obtaining the optimal path-length dependencies without prior knowledge of . We then develop the first high-probability guarantees for both static and dynamic regret in uBLO. Finally, we discuss lower bounds on the static regret, and prove the folklore rate for adversarial linear bandits on the unit Euclidean ball, which is of independent interest.

Paper Structure

This paper contains 39 sections, 31 theorems, 170 equations, 6 algorithms.

Key Result

Proposition 1

Let $H_t\in\mathbb{R}^{d\times d}$ be positive definite and let $v_{1},\ldots,v_{d}$ be an orthonormal basis of eigenvectors of $H_t$. Consider the set $\mathcal{S}=\{\sigma v_i:\ \sigma\in\{-1,1\},\ i\in[d]\}$. Let $s_t$ be sampled uniformly at random from $\mathcal{S}$, and define Then, the following hold where $\lambda_t$ is the eigenvalue of $H_t$ associated with the eigenvector $v_t$ sample

Theorems & Definitions (51)

  • Proposition 1
  • Corollary 1
  • Proposition 2
  • Theorem 3.1
  • Remark 1
  • Theorem 3.2
  • Theorem 3.3
  • Proposition 3
  • Theorem 4.1
  • Theorem 4.2
  • ...and 41 more