Adversarial Bandits against Arbitrary Strategies

Jung-hun Kim; Se-Young Yun

Adversarial Bandits against Arbitrary Strategies

Jung-hun Kim, Se-Young Yun

TL;DR

This work adopts the master-base framework using the online mirror descent method and provides a master-base algorithm with simple OMD, achieving $\tilde{O}(S^{1/2}K^{1/3}T^{2/3}T^{2/3})$, in which $T^{2/3}$ comes from the variance of loss estimators.

Abstract

We study the adversarial bandit problem against arbitrary strategies, where the difficulty is captured by an unknown parameter $S$, which is the number of switches in the best arm in hindsight. To handle this problem, we adopt the master-base framework using the online mirror descent method (OMD). We first provide a master-base algorithm with simple OMD, achieving $\tilde{O}(S^{1/2}K^{1/3}T^{2/3})$, in which $T^{2/3}$ comes from the variance of loss estimators. To mitigate the impact of the variance, we propose using adaptive learning rates for OMD and achieve $\tilde{O}(\min\{\sqrt{SKTρ},S\sqrt{KT}\})$, where $ρ$ is a variance term for loss estimators.

Adversarial Bandits against Arbitrary Strategies

TL;DR

This work adopts the master-base framework using the online mirror descent method and provides a master-base algorithm with simple OMD, achieving

, in which

comes from the variance of loss estimators.

Abstract

We study the adversarial bandit problem against arbitrary strategies, where the difficulty is captured by an unknown parameter

, which is the number of switches in the best arm in hindsight. To handle this problem, we adopt the master-base framework using the online mirror descent method (OMD). We first provide a master-base algorithm with simple OMD, achieving

, in which

comes from the variance of loss estimators. To mitigate the impact of the variance, we propose using adaptive learning rates for OMD and achieve

, where

is a variance term for loss estimators.

Paper Structure (17 sections, 7 theorems, 101 equations, 2 algorithms)

This paper contains 17 sections, 7 theorems, 101 equations, 2 algorithms.

Introduction
Problem Statement
Algorithms and Regret Analysis
Master-Base Framework
Online Mirror Descent (OMD)
Master-Base OMD
Regret from the near-optimal base.
Regret from the master.
Overall Regret.
Master-Base OMD with Adaptive Learning Rates
Regret from the near-optimal base.
Regret from the master.
Overall Regret.
Conclusion
Appendix
...and 2 more sections

Key Result

Theorem 3.1

For any switch number $S\in[T-1]$, Algorithm alg:alg1 achieves a regret bound of

Theorems & Definitions (17)

Theorem 3.1
proof : Proof Sketch
Theorem 3.2
proof : Proof Sketch
Lemma 3.3
Remark 3.4
Remark 3.5
Remark 3.6: Implementation
Remark 3.7
Lemma A.1: Theorem 28.4 and Eq. 28.11 in lattimore2020bandit
...and 7 more

Adversarial Bandits against Arbitrary Strategies

TL;DR

Abstract

Adversarial Bandits against Arbitrary Strategies

Authors

TL;DR

Abstract

Table of Contents

Key Result

Theorems & Definitions (17)