Table of Contents
Fetching ...

Adversarial bandit optimization for approximately linear functions

Zhuoyu Cheng, Kohei Hatano, Eiji Takimoto

TL;DR

This work studies bandit optimization for losses that are epsilon-approximately linear, i.e., a linear term plus an adversarial perturbation, in a nonconvex, non-smooth setting. It extends SCRiBLe with lifting and a nu-normal barrier to achieve both expected and high-probability regret bounds, and recovers improved high-probability guarantees for bandit linear optimization when epsilon = 0. A novel regret decomposition separates Reg-Term, Deviation-Term, and Error-Term, enabling tighter probabilistic bounds and a natural OB/BB transformation to black-box optimization. Experiments on synthetic data illustrate favorable performance against prior SCRiBLe variants, supporting the theoretical improvements.

Abstract

We consider a bandit optimization problem for nonconvex and non-smooth functions, where in each trial the loss function is the sum of a linear function and a small but arbitrary perturbation chosen after observing the player's choice. We give both expected and high probability regret bounds for the problem. Our result also implies an improved high-probability regret bound for the bandit linear optimization, a special case with no perturbation. We also give a lower bound on the expected regret.

Adversarial bandit optimization for approximately linear functions

TL;DR

This work studies bandit optimization for losses that are epsilon-approximately linear, i.e., a linear term plus an adversarial perturbation, in a nonconvex, non-smooth setting. It extends SCRiBLe with lifting and a nu-normal barrier to achieve both expected and high-probability regret bounds, and recovers improved high-probability guarantees for bandit linear optimization when epsilon = 0. A novel regret decomposition separates Reg-Term, Deviation-Term, and Error-Term, enabling tighter probabilistic bounds and a natural OB/BB transformation to black-box optimization. Experiments on synthetic data illustrate favorable performance against prior SCRiBLe variants, supporting the theoretical improvements.

Abstract

We consider a bandit optimization problem for nonconvex and non-smooth functions, where in each trial the loss function is the sum of a linear function and a small but arbitrary perturbation chosen after observing the player's choice. We give both expected and high probability regret bounds for the problem. Our result also implies an improved high-probability regret bound for the bandit linear optimization, a special case with no perturbation. We also give a lower bound on the expected regret.

Paper Structure

This paper contains 24 sections, 14 theorems, 51 equations, 1 figure, 1 algorithm.

Key Result

theorem thmcountertheorem

The algorithm with parameters $\eta = \frac{\sqrt{\nu\ln \frac{1}{\delta}}}{2d\sqrt{T}}$ and guarantees the following expected regret bound

Figures (1)

  • Figure 1: Average cumulative loss of algorithms for artificial data sets. The blue line corresponds to the results of our algorithm, the yellow line represents the algorithm in lee2020bias, and the green line corresponds to the SCRiBLe algorithm abernethy2008competing.

Theorems & Definitions (23)

  • definition thmcounterdefinition
  • theorem thmcountertheorem
  • theorem thmcountertheorem
  • definition thmcounterdefinition
  • lemma thmcounterlemma: nemirovski2004interiornesterov1994interior
  • lemma thmcounterlemma: nemirovski2004interior
  • lemma thmcounterlemma: hazan2016introduction
  • lemma thmcounterlemma
  • lemma thmcounterlemma
  • lemma thmcounterlemma
  • ...and 13 more