Optimal Rates for Robust Stochastic Convex Optimization
Changyu Gao, Andrew Lowy, Xingyu Zhou, Stephen J. Wright
TL;DR
This work tackles robust stochastic convex optimization under the $\epsilon$-contamination model, aiming to determine minimax-optimal excess risk without requiring stringent per-sample smoothness. It introduces a net-based projected gradient framework that robustly estimates gradients at a dense net of the domain, leveraging dense-good-sets and a stabilized gradient estimator to achieve a minimax-optimal excess risk of $\tilde{O}\big(D(\sigma\sqrt{\epsilon}+\sigma\sqrt{\frac{d\log(1/\tau)}{n}})\big)$, with sample complexity $n=\tilde{Ω}(d/\epsilon)$, under the mild assumption that the population risk $\overline f$ is $\bar{\beta}$-smooth and gradient covariances are bounded by $\Sigma_w\preceq \sigma^2 I$. The authors also provide a simpler projected-gradient-descent alternative under stronger assumptions, a lower bound showing minimax-optimality, and extensions to unknown $\sigma$ and nonsmooth population risks via convolution smoothing. Notably, the method improves upon SEVER by relaxing assumptions and reducing required samples while preserving near-optimal rates, with practical polynomial-time implementations and clear implications for robust optimization in high dimensions.
Abstract
Machine learning algorithms in high-dimensional settings are highly susceptible to the influence of even a small fraction of structured outliers, making robust optimization techniques essential. In particular, within the $ε$-contamination model, where an adversary can inspect and replace up to an $ε$-fraction of the samples, a fundamental open problem is determining the optimal rates for robust stochastic convex optimization (SCO) under such contamination. We develop novel algorithms that achieve minimax-optimal excess risk (up to logarithmic factors) under the $ε$-contamination model. Our approach improves over existing algorithms, which are not only suboptimal but also require stringent assumptions, including Lipschitz continuity and smoothness of individual sample functions. By contrast, our optimal algorithms do not require these stringent assumptions, assuming only population-level smoothness of the loss. Moreover, our algorithms can be adapted to handle the case in which the covariance parameter is unknown, and can be extended to nonsmooth population risks via convolutional smoothing. We complement our algorithmic developments with a tight information-theoretic lower bound for robust SCO.
