Achieving Optimal Static and Dynamic Regret Simultaneously in Bandits with Deterministic Losses

Jian Qian; Chen-Yu Wei

Achieving Optimal Static and Dynamic Regret Simultaneously in Bandits with Deterministic Losses

Jian Qian, Chen-Yu Wei

TL;DR

This work resolves the long-standing question of whether one can simultaneously achieve optimal static and dynamic regret in adversarial bandits under a deterministic-loss and oblivious-adversary setting. By leveraging Blackwell approachability and a negative static regret mechanism, the authors design a general algorithm that attains $\widetilde{O}(\sqrt{AT})$ static regret and $\widetilde{O}(\sqrt{SAT})$ dynamic regret against an oblivious adversary, while also strengthening the adaptive-adversary lower bound. The results reveal a fundamental separation between oblivious and adaptive adversaries in multi-armed bandits when multiple regret benchmarks are optimized together, and they connect to the broader open problem of simultaneously handling benchmarks with different switch counts. The approach unifies vector-valued online learning with scalar regret guarantees, offering a new model-selection-style procedure for bandits that could be of independent interest in sequential decision-making. Overall, the paper advances our understanding of best-of-all-worlds performance in non-stationary, adversarial environments.

Abstract

In adversarial multi-armed bandits, two performance measures are commonly used: static regret, which compares the learner to the best fixed arm, and dynamic regret, which compares it to the best sequence of arms. While optimal algorithms are known for each measure individually, there is no known algorithm achieving optimal bounds for both simultaneously. Marinov and Zimmert [2021] first showed that such simultaneous optimality is impossible against an adaptive adversary. Our work takes a first step to demonstrate its possibility against an oblivious adversary when losses are deterministic. First, we extend the impossibility result of Marinov and Zimmert [2021] to the case of deterministic losses. Then, we present an algorithm achieving optimal static and dynamic regret simultaneously against an oblivious adversary. Together, they reveal a fundamental separation between adaptive and oblivious adversaries when multiple regret benchmarks are considered simultaneously. It also provides new insight into the long open problem of simultaneously achieving optimal regret against switching benchmarks of different numbers of switches. Our algorithm uses negative static regret to compensate for the exploration overhead incurred when controlling dynamic regret, and leverages Blackwell approachability to jointly control both regrets. This yields a new model selection procedure for bandits that may be of independent interest.

Achieving Optimal Static and Dynamic Regret Simultaneously in Bandits with Deterministic Losses

TL;DR

static regret and

dynamic regret against an oblivious adversary, while also strengthening the adaptive-adversary lower bound. The results reveal a fundamental separation between oblivious and adaptive adversaries in multi-armed bandits when multiple regret benchmarks are optimized together, and they connect to the broader open problem of simultaneously handling benchmarks with different switch counts. The approach unifies vector-valued online learning with scalar regret guarantees, offering a new model-selection-style procedure for bandits that could be of independent interest in sequential decision-making. Overall, the paper advances our understanding of best-of-all-worlds performance in non-stationary, adversarial environments.

Abstract

Paper Structure (57 sections, 22 theorems, 144 equations, 3 algorithms)

This paper contains 57 sections, 22 theorems, 144 equations, 3 algorithms.

Introduction
Preliminaries
Notation
Problem Setup
Paper organization
A Lower Bound for Adaptive Adversary
Opportunities with an Oblivious Adversary
Warm-Up: Beating Oblivious Adversary in the Lower Bound Example
Problem setup
Epoch-level strategies
Regret Upper Bounds under $\hbox{$\times$}$ and $\circ$
A Multi-Objective Formulation
Checking approachability under full information
A Recipe for Approachability Algorithm Design
Choice of $(p_{k,\hbox{$\times$}}, p_{k,\circ})$ and Bound on $\sum_k \theta_k^\top \hat{v}_k$
...and 42 more sections

Key Result

Theorem 1

For any $S\geqslant 2$, there is a deterministic multi-armed bandit problem with no more than $S-1$ switches such that any algorithm must suffer $\text{\rm DReg} \geqslant \Omega(\sqrt{SAT})$ against an oblivious adversary.

Theorems & Definitions (39)

Theorem 1: Theorem 4.1 of wei2016tracking
Theorem 2: Hardness with an adaptive adversary
proof
Theorem 3
Theorem 4
proof : Proof of thm: lower bound general S
Lemma 1: Freedman's inequality (Lemma A.3 of foster2021statistical)
Lemma 2: Lemma 1 of neu2015explore
Lemma 3: Lemma 20 of dann2023best
Lemma 4: Lemma 21 of dann2023best
...and 29 more

Achieving Optimal Static and Dynamic Regret Simultaneously in Bandits with Deterministic Losses

TL;DR

Abstract

Achieving Optimal Static and Dynamic Regret Simultaneously in Bandits with Deterministic Losses

Authors

TL;DR

Abstract

Table of Contents

Key Result

Theorems & Definitions (39)