Table of Contents
Fetching ...

Adversarial Bandit Optimization with Globally Bounded Perturbations to Linear Losses

Zhuoyu Cheng, Kohei Hatano, Eiji Takimoto

Abstract

We study a class of adversarial bandit optimization problems in which the loss functions may be non-convex and non-smooth. In each round, the learner observes a loss that consists of an underlying linear component together with an additional perturbation applied after the learner selects an action. The perturbations are measured relative to the linear losses and are constrained by a global budget that bounds their cumulative magnitude over time. Under this model, we establish both expected and high-probability regret guarantees. As a special case of our analysis, we recover an improved high-probability regret bound for classical bandit linear optimization, which corresponds to the setting without perturbations. We further complement our upper bounds by proving a lower bound on the expected regret.

Adversarial Bandit Optimization with Globally Bounded Perturbations to Linear Losses

Abstract

We study a class of adversarial bandit optimization problems in which the loss functions may be non-convex and non-smooth. In each round, the learner observes a loss that consists of an underlying linear component together with an additional perturbation applied after the learner selects an action. The perturbations are measured relative to the linear losses and are constrained by a global budget that bounds their cumulative magnitude over time. Under this model, we establish both expected and high-probability regret guarantees. As a special case of our analysis, we recover an improved high-probability regret bound for classical bandit linear optimization, which corresponds to the setting without perturbations. We further complement our upper bounds by proving a lower bound on the expected regret.

Paper Structure

This paper contains 19 sections, 12 theorems, 29 equations, 1 figure, 1 algorithm.

Key Result

Theorem 1

For any $\delta \in (0,1)$, the algorithm with parameter $\eta = \frac{\sqrt{\nu \ln \frac{1}{\delta}}}{2d\sqrt{T}}$ guarantees the following expected regret bound:

Figures (1)

  • Figure 1: Average regret of algorithms for artificial data sets. The yellow line corresponds to the results of Algorithm \ref{['alg1']}, the blue line corresponds to the SCRiBLe algorithm abernethy2008competing

Theorems & Definitions (19)

  • Definition 1: $C$-approximately linear function sequence
  • Theorem 1
  • Theorem 2
  • Definition 2: nemirovski2004interiornesterov1994interior
  • Lemma 3: nemirovski2004interior
  • Lemma 4: hazan2016introduction
  • Lemma 5
  • Lemma 6: abernethy2008competing
  • Lemma 7
  • Lemma 8
  • ...and 9 more