Table of Contents
Fetching ...

Optimal Rates for Robust Stochastic Convex Optimization

Changyu Gao, Andrew Lowy, Xingyu Zhou, Stephen J. Wright

TL;DR

This work tackles robust stochastic convex optimization under the $\epsilon$-contamination model, aiming to determine minimax-optimal excess risk without requiring stringent per-sample smoothness. It introduces a net-based projected gradient framework that robustly estimates gradients at a dense net of the domain, leveraging dense-good-sets and a stabilized gradient estimator to achieve a minimax-optimal excess risk of $\tilde{O}\big(D(\sigma\sqrt{\epsilon}+\sigma\sqrt{\frac{d\log(1/\tau)}{n}})\big)$, with sample complexity $n=\tilde{Ω}(d/\epsilon)$, under the mild assumption that the population risk $\overline f$ is $\bar{\beta}$-smooth and gradient covariances are bounded by $\Sigma_w\preceq \sigma^2 I$. The authors also provide a simpler projected-gradient-descent alternative under stronger assumptions, a lower bound showing minimax-optimality, and extensions to unknown $\sigma$ and nonsmooth population risks via convolution smoothing. Notably, the method improves upon SEVER by relaxing assumptions and reducing required samples while preserving near-optimal rates, with practical polynomial-time implementations and clear implications for robust optimization in high dimensions.

Abstract

Machine learning algorithms in high-dimensional settings are highly susceptible to the influence of even a small fraction of structured outliers, making robust optimization techniques essential. In particular, within the $ε$-contamination model, where an adversary can inspect and replace up to an $ε$-fraction of the samples, a fundamental open problem is determining the optimal rates for robust stochastic convex optimization (SCO) under such contamination. We develop novel algorithms that achieve minimax-optimal excess risk (up to logarithmic factors) under the $ε$-contamination model. Our approach improves over existing algorithms, which are not only suboptimal but also require stringent assumptions, including Lipschitz continuity and smoothness of individual sample functions. By contrast, our optimal algorithms do not require these stringent assumptions, assuming only population-level smoothness of the loss. Moreover, our algorithms can be adapted to handle the case in which the covariance parameter is unknown, and can be extended to nonsmooth population risks via convolutional smoothing. We complement our algorithmic developments with a tight information-theoretic lower bound for robust SCO.

Optimal Rates for Robust Stochastic Convex Optimization

TL;DR

This work tackles robust stochastic convex optimization under the -contamination model, aiming to determine minimax-optimal excess risk without requiring stringent per-sample smoothness. It introduces a net-based projected gradient framework that robustly estimates gradients at a dense net of the domain, leveraging dense-good-sets and a stabilized gradient estimator to achieve a minimax-optimal excess risk of , with sample complexity , under the mild assumption that the population risk is -smooth and gradient covariances are bounded by . The authors also provide a simpler projected-gradient-descent alternative under stronger assumptions, a lower bound showing minimax-optimality, and extensions to unknown and nonsmooth population risks via convolution smoothing. Notably, the method improves upon SEVER by relaxing assumptions and reducing required samples while preserving near-optimal rates, with practical polynomial-time implementations and clear implications for robust optimization in high dimensions.

Abstract

Machine learning algorithms in high-dimensional settings are highly susceptible to the influence of even a small fraction of structured outliers, making robust optimization techniques essential. In particular, within the -contamination model, where an adversary can inspect and replace up to an -fraction of the samples, a fundamental open problem is determining the optimal rates for robust stochastic convex optimization (SCO) under such contamination. We develop novel algorithms that achieve minimax-optimal excess risk (up to logarithmic factors) under the -contamination model. Our approach improves over existing algorithms, which are not only suboptimal but also require stringent assumptions, including Lipschitz continuity and smoothness of individual sample functions. By contrast, our optimal algorithms do not require these stringent assumptions, assuming only population-level smoothness of the loss. Moreover, our algorithms can be adapted to handle the case in which the covariance parameter is unknown, and can be extended to nonsmooth population risks via convolutional smoothing. We complement our algorithmic developments with a tight information-theoretic lower bound for robust SCO.

Paper Structure

This paper contains 28 sections, 21 theorems, 26 equations, 1 table, 4 algorithms.

Key Result

Theorem 11

Suppose that assump:conv_loss, assump:cov, and assump:smooth-bounded-var-new hold. There are choices of stepsizes $\{\eta_t\}_{t=1}^T$ and $T$ such that, with probability at least $1-\tau$, we have As a consequence, the algorithm achieves excess risk of $O(D \sigma\sqrt{\epsilon})$ with high probability whenever $n = \tilde{\Omega}(d / \epsilon)$.

Theorems & Definitions (29)

  • Definition 1: $\epsilon$-contamination model
  • Definition 4: "Good" set
  • Definition 6: $\gamma$-approximate learner
  • Remark 7
  • Theorem 11
  • Remark 12
  • Theorem 13
  • Proposition 14
  • Remark 15
  • Theorem 16
  • ...and 19 more