Non-Stationary Lipschitz Bandits
Nicolas Nguyen, Solenne Gaucher, Claire Vernade
TL;DR
This work studies non-stationary Lipschitz bandits over a continuous action space, where the reward function $\mu_t$ can change arbitrarily in time yet remains Lipschitz in the action. The authors introduce MDBE, a multi-depth bin elimination algorithm that discretizes the action space hierarchically, runs replays at multiple scales, and evicts suboptimal regions to adaptively track significant shifts without prior knowledge of non-stationarity. They prove minimax-optimal dynamic regret bounds $\mathbb{E}[R(\pi_{MDBE},T)] = \widetilde{O}(\tilde{L}^{1/3} T^{2/3})$, and provide matching lower bounds, along with extensions to Hölder and multi-dimensional settings. Theoretical results are complemented by discussions of extensions, lower bounds, and potential future work on scalability and practical deployment. Overall, this work delivers the first optimal guarantees for non-stationary Lipschitz bandits and introduces a versatile, scale-aware adaptation mechanism for continuous-action exploration.
Abstract
We study the problem of non-stationary Lipschitz bandits, where the number of actions is infinite and the reward function, satisfying a Lipschitz assumption, can change arbitrarily over time. We design an algorithm that adaptively tracks the recently introduced notion of significant shifts, defined by large deviations of the cumulative reward function. To detect such reward changes, our algorithm leverages a hierarchical discretization of the action space. Without requiring any prior knowledge of the non-stationarity, our algorithm achieves a minimax-optimal dynamic regret bound of $\mathcal{\widetilde{O}}(\tilde{L}^{1/3}T^{2/3})$, where $\tilde{L}$ is the number of significant shifts and $T$ the horizon. This result provides the first optimal guarantee in this setting.
