No-Regret Gaussian Process Optimization of Time-Varying Functions
Eliabelle Mauduit, Eloïse Berthier, Andrea Simonetto
TL;DR
This paper tackles the challenge of tracking time-varying rewards in Gaussian process bandits under a frequentist RKHS framework. It introduces uncertainty injection to model temporal drift and proves that pure bandit feedback yields sublinear regret only in very slow variation regimes (α<1/3), with faster variation requiring extra information. The authors present W-SparQ-GP-UCB, a windowed algorithm that achieves no-regret with a vanishing average number of additional queries, and provide tight upper and lower bounds on regret and on the number of queries across three α regimes. Through synthetic and real-data experiments, they demonstrate substantial improvements in regret and query efficiency, while also deriving information-theoretic lower bounds (via Fano/KL arguments) that illuminate the trade-off between data freshness and query cost. Overall, the work offers a principled, scalable approach to time-varying GP optimization with quantifiable limits on information usage, and opens avenues for adaptive query strategies and broader kernel families.
Abstract
Sequential optimization of black-box functions from noisy evaluations has been widely studied, with Gaussian Process bandit algorithms such as GP-UCB guaranteeing no-regret in stationary settings. However, for time-varying objectives, it is known that no-regret is unattainable under pure bandit feedback unless strong and often unrealistic assumptions are imposed. In this article, we propose a novel method to optimize time-varying rewards in the frequentist setting, where the objective has bounded RKHS norm. Time variations are captured through uncertainty injection (UI), which enables heteroscedastic GP regression that adapts past observations to the current time step. As no-regret is unattainable in general in the strict bandit setting, we relax the latter allowing additional queries on previously observed points. Building on sparse inference and the effect of UI on regret, we propose W-SparQ-GP-UCB, an online algorithm that achieves no-regret with only a vanishing number of additional queries per iteration. To assess the theoretical limits of this approach, we establish a lower bound on the number of additional queries required for no-regret, proving the efficiency of our method. Finally, we provide a comprehensive analysis linking the degree of time-variation of the function to achievable regret rates, together with upper and lower bounds on the number of additional queries needed in each regime.
