Identifying the Best Arm in the Presence of Global Environment Shifts
Phurinut Srisawad, Juergen Branke, Long Tran-Thanh
TL;DR
The paper addresses best-arm identification under global environment shifts, where rewards satisfy μ_{ij} = μ_i + s_j, and environments are piecewise stationary. It reframes the problem as a regression task and introduces an OLS-based selection approach together with LinLUCB, an allocation policy that integrates regression uncertainty into a confidence bound. Key contributions include (i) an unbiased, tractable OLS estimator for arm means and environment shifts with higher-order covariance structure, (ii) a regression-informed LUCB-like allocation that enforces two distinct samples per environment, and (iii) extensive empirical evidence showing LinLUCB outperforms standard policies and Reduce-to-MAB baselines across multiple non-stationary settings. The work demonstrates that exploiting the global-shift structure yields practical performance gains in non-stationary BAI and provides a foundation for future extensions to relax assumptions on shift patterns and noise heterogeneity.
Abstract
This paper formulates a new Best-Arm Identification problem in the non-stationary stochastic bandits setting, where the means of all arms are shifted in the same way due to a global influence of the environment. The aim is to identify the unique best arm across environmental change given a fixed total budget. While this setting can be regarded as a special case of Adversarial Bandits or Corrupted Bandits, we demonstrate that existing solutions tailored to those settings do not fully utilise the nature of this global influence, and thus, do not work well in practice (despite their theoretical guarantees). To overcome this issue, in this paper we develop a novel selection policy that is consistent and robust in dealing with global environmental shifts. We then propose an allocation policy, LinLUCB, which exploits information about global shifts across all arms in each environment. Empirical tests depict a significant improvement in our policies against other existing methods.
