Table of Contents
Fetching ...

Adversarial Bandits against Arbitrary Strategies

Jung-hun Kim, Se-Young Yun

TL;DR

This work adopts the master-base framework using the online mirror descent method and provides a master-base algorithm with simple OMD, achieving $\tilde{O}(S^{1/2}K^{1/3}T^{2/3}T^{2/3})$, in which $T^{2/3}$ comes from the variance of loss estimators.

Abstract

We study the adversarial bandit problem against arbitrary strategies, where the difficulty is captured by an unknown parameter $S$, which is the number of switches in the best arm in hindsight. To handle this problem, we adopt the master-base framework using the online mirror descent method (OMD). We first provide a master-base algorithm with simple OMD, achieving $\tilde{O}(S^{1/2}K^{1/3}T^{2/3})$, in which $T^{2/3}$ comes from the variance of loss estimators. To mitigate the impact of the variance, we propose using adaptive learning rates for OMD and achieve $\tilde{O}(\min\{\sqrt{SKTρ},S\sqrt{KT}\})$, where $ρ$ is a variance term for loss estimators.

Adversarial Bandits against Arbitrary Strategies

TL;DR

This work adopts the master-base framework using the online mirror descent method and provides a master-base algorithm with simple OMD, achieving , in which comes from the variance of loss estimators.

Abstract

We study the adversarial bandit problem against arbitrary strategies, where the difficulty is captured by an unknown parameter , which is the number of switches in the best arm in hindsight. To handle this problem, we adopt the master-base framework using the online mirror descent method (OMD). We first provide a master-base algorithm with simple OMD, achieving , in which comes from the variance of loss estimators. To mitigate the impact of the variance, we propose using adaptive learning rates for OMD and achieve , where is a variance term for loss estimators.
Paper Structure (17 sections, 7 theorems, 101 equations, 2 algorithms)

This paper contains 17 sections, 7 theorems, 101 equations, 2 algorithms.

Key Result

Theorem 3.1

For any switch number $S\in[T-1]$, Algorithm alg:alg1 achieves a regret bound of

Theorems & Definitions (17)

  • Theorem 3.1
  • proof : Proof Sketch
  • Theorem 3.2
  • proof : Proof Sketch
  • Lemma 3.3
  • Remark 3.4
  • Remark 3.5
  • Remark 3.6: Implementation
  • Remark 3.7
  • Lemma A.1: Theorem 28.4 and Eq. 28.11 in lattimore2020bandit
  • ...and 7 more