Adversarial Bandits against Arbitrary Strategies
Jung-hun Kim, Se-Young Yun
TL;DR
This work adopts the master-base framework using the online mirror descent method and provides a master-base algorithm with simple OMD, achieving $\tilde{O}(S^{1/2}K^{1/3}T^{2/3}T^{2/3})$, in which $T^{2/3}$ comes from the variance of loss estimators.
Abstract
We study the adversarial bandit problem against arbitrary strategies, where the difficulty is captured by an unknown parameter $S$, which is the number of switches in the best arm in hindsight. To handle this problem, we adopt the master-base framework using the online mirror descent method (OMD). We first provide a master-base algorithm with simple OMD, achieving $\tilde{O}(S^{1/2}K^{1/3}T^{2/3})$, in which $T^{2/3}$ comes from the variance of loss estimators. To mitigate the impact of the variance, we propose using adaptive learning rates for OMD and achieve $\tilde{O}(\min\{\sqrt{SKTρ},S\sqrt{KT}\})$, where $ρ$ is a variance term for loss estimators.
