Median Clipping for Zeroth-order Non-Smooth Convex Optimization and Multi-Armed Bandit Problem with Heavy-tailed Symmetric Noise
Nikita Kornilov, Yuriy Dorn, Aleksandr Lobanov, Nikolay Kutuzov, Innokentiy Shibaev, Eduard Gorbunov, Alexander Nazin, Alexander Gasnikov
TL;DR
The paper tackles non-smooth convex zeroth-order optimization under symmetric heavy-tailed noise, where observations can have unbounded moments. It introduces a novel zeroth-order oracle and median-based gradient estimation with clipping, yielding high-probability convergence rates that remain optimal for bounded-variance settings across any κ>0. Two main algorithms are proposed: ZO-clipped-med-SSTM for unconstrained problems and ZO-clipped-med-SMD for constrained domains, both leveraging batched median estimates of gradient differences to achieve robust performance. The methods extend to stochastic multi-armed bandits, with Clipped-INF-med-SMD delivering a $ ilde{O}( oot 4 obreak o obreak ext{d}} \
Abstract
In this paper, we consider non-smooth convex optimization with a zeroth-order oracle corrupted by symmetric stochastic noise. Unlike the existing high-probability results requiring the noise to have bounded $κ$-th moment with $κ\in (1,2]$, our results allow even heavier noise with any $κ> 0$, e.g., the noise distribution can have unbounded expectation. Our convergence rates match the best-known ones for the case of the bounded variance, namely, to achieve function accuracy $\varepsilon$ our methods with Lipschitz oracle require $\tilde{O}(d^2\varepsilon^{-2})$ iterations for any $κ> 0$. We build the median gradient estimate with bounded second moment as the mini-batched median of the sampled gradient differences. We apply this technique to the stochastic multi-armed bandit problem with heavy-tailed distribution of rewards and achieve $\tilde{O}(\sqrt{dT})$ regret. We demonstrate the performance of our zeroth-order and MAB algorithms for various $κ\in (0,2]$ on synthetic and real-world data. Our methods do not lose to SOTA approaches and dramatically outperform them for $κ\leq 1$.
