Rising Rested MAB with Linear Drift
Omer Amichay, Yishay Mansour
TL;DR
This work studies a non-stationary rising rested MAB where each arm's reward mean grows linearly with the number of pulls, formalized as $\mu_i(n)=L_i n+b_i$ with $L_i\ge0$. The authors prove a tight regret bound of $\tilde{\Theta}(T^{4/5}K^{3/5})$ and provide both upper and lower bounds, including instance-dependent refinements. They introduce the R-ed-EE explore-exploit algorithm, achieving $O\left(T^{4/5}(\Phi K)^{3/5}\ln(\Phi KT)^{1/5}\right)$ regret, and two instance-dependent algorithms, R-ed-AE and HR-re-AE, with bounds that adapt to problem parameters; they also establish a near-matching lower bound $\Omega(K^{3/5}T^{4/5})$. An important takeaway is that, unlike stationary stochastic MAB, the rising linear-drift setting incurs substantially higher regret, and the horizon-unknown case incurs linear regret even under favorable conditions. The results offer a principled understanding of exploration-exploitation in changing environments and pave the way for further study of hybrid rising/rotating drift models.
Abstract
We consider non-stationary multi-arm bandit (MAB) where the expected reward of each action follows a linear function of the number of times we executed the action. Our main result is a tight regret bound of $\tildeΘ(T^{4/5}K^{3/5})$, by providing both upper and lower bounds. We extend our results to derive instance dependent regret bounds, which depend on the unknown parametrization of the linear drift of the rewards.
