Rising Rested MAB with Linear Drift

Omer Amichay; Yishay Mansour

Rising Rested MAB with Linear Drift

Omer Amichay, Yishay Mansour

TL;DR

This work studies a non-stationary rising rested MAB where each arm's reward mean grows linearly with the number of pulls, formalized as $\mu_i(n)=L_i n+b_i$ with $L_i\ge0$. The authors prove a tight regret bound of $\tilde{\Theta}(T^{4/5}K^{3/5})$ and provide both upper and lower bounds, including instance-dependent refinements. They introduce the R-ed-EE explore-exploit algorithm, achieving $O\left(T^{4/5}(\Phi K)^{3/5}\ln(\Phi KT)^{1/5}\right)$ regret, and two instance-dependent algorithms, R-ed-AE and HR-re-AE, with bounds that adapt to problem parameters; they also establish a near-matching lower bound $\Omega(K^{3/5}T^{4/5})$. An important takeaway is that, unlike stationary stochastic MAB, the rising linear-drift setting incurs substantially higher regret, and the horizon-unknown case incurs linear regret even under favorable conditions. The results offer a principled understanding of exploration-exploitation in changing environments and pave the way for further study of hybrid rising/rotating drift models.

Abstract

We consider non-stationary multi-arm bandit (MAB) where the expected reward of each action follows a linear function of the number of times we executed the action. Our main result is a tight regret bound of $\tildeΘ(T^{4/5}K^{3/5})$, by providing both upper and lower bounds. We extend our results to derive instance dependent regret bounds, which depend on the unknown parametrization of the linear drift of the rewards.

Rising Rested MAB with Linear Drift

TL;DR

This work studies a non-stationary rising rested MAB where each arm's reward mean grows linearly with the number of pulls, formalized as

with

. The authors prove a tight regret bound of

and provide both upper and lower bounds, including instance-dependent refinements. They introduce the R-ed-EE explore-exploit algorithm, achieving

regret, and two instance-dependent algorithms, R-ed-AE and HR-re-AE, with bounds that adapt to problem parameters; they also establish a near-matching lower bound

. An important takeaway is that, unlike stationary stochastic MAB, the rising linear-drift setting incurs substantially higher regret, and the horizon-unknown case incurs linear regret even under favorable conditions. The results offer a principled understanding of exploration-exploitation in changing environments and pave the way for further study of hybrid rising/rotating drift models.

Abstract

, by providing both upper and lower bounds. We extend our results to derive instance dependent regret bounds, which depend on the unknown parametrization of the linear drift of the rewards.

Paper Structure (44 sections, 22 theorems, 62 equations, 1 figure, 1 table, 3 algorithms)

This paper contains 44 sections, 22 theorems, 62 equations, 1 figure, 1 table, 3 algorithms.

Introduction
Our results
Related Works
Stochastic stationary MAB
Non-Stationary MAB
Rising rested MAB
Problem Setting and Preliminaries
Non-stationary rested $K$-MAB
Rested MAB with linear drift
Policies and regret
Characterization of the optimal policy
Notations
Algorithm for Rising Rested MAB with Linear Drift
Overview of regret analysis
Instance Dependent Upper bound for Rising Rested MAB with Linear Drift
...and 29 more sections

Key Result

Corollary 2

For Rising Rested MAB with Linear Drift the dynamic regret is equal to the static regret. Namely, the optimal policy plays always arm $i^*.$

Figures (1)

Figure 1: Sample figure caption.

Theorems & Definitions (51)

Remark 1
Corollary 2
Definition 3
Lemma 4
Theorem 5
proof
Definition 6
Definition 7
Lemma 8
Lemma 9
...and 41 more

Rising Rested MAB with Linear Drift

TL;DR

Abstract

Rising Rested MAB with Linear Drift

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (1)

Theorems & Definitions (51)