Catoni-Style Change Point Detection for Regret Minimization in Non-Stationary Heavy-Tailed Bandits
Gianmarco Genalti, Sujay Bhatt, Nicola Gatti, Alberto Maria Metelli
TL;DR
This work tackles regret minimization in heavy-tailed piecewise-stationary bandits (HTPS MABs) by introducing Catoni-FCS-detector, a Catoni-style confidence-sequence–based change-point detector, and Robust-CPD-UCB, a meta-algorithm that combines a stationary regret-minimizer with CPD and cyclic exploration. The authors prove a minimax lower bound for HTPS bandits and establish near-optimal regret guarantees for Robust-CPD-UCB, including instance-dependent and instance-independent regimes; they also demonstrate finite-time detection delays under infinite variance. The framework is validated through extensive experiments on synthetic and real-world data, including financial/crypto five datasets, showing robust performance under heavy tails and frequent changes. The results advance practical regret minimization in non-stationary, heavy-tailed environments, enabling reliable learning in finance, communications, and related domains.
Abstract
Regret minimization in stochastic non-stationary bandits gained popularity over the last decade, as it can model a broad class of real-world problems, from advertising to recommendation systems. Existing literature relies on various assumptions about the reward-generating process, such as Bernoulli or subgaussian rewards. However, in settings such as finance and telecommunications, heavy-tailed distributions naturally arise. In this work, we tackle the heavy-tailed piecewise-stationary bandit problem. Heavy-tailed bandits, introduced by Bubeck et al., 2013, operate on the minimal assumption that the finite absolute centered moments of maximum order $1+ε$ are uniformly bounded by a constant $v<+\infty$, for some $ε\in (0,1]$. We focus on the most popular non-stationary bandit setting, i.e., the piecewise-stationary setting, in which the mean of reward-generating distributions may change at unknown time steps. We provide a novel Catoni-style change-point detection strategy tailored for heavy-tailed distributions that relies on recent advancements in the theory of sequential estimation, which is of independent interest. We introduce Robust-CPD-UCB, which combines this change-point detection strategy with optimistic algorithms for bandits, providing its regret upper bound and an impossibility result on the minimum attainable regret for any policy. Finally, we validate our approach through numerical experiments on synthetic and real-world datasets.
