Table of Contents
Fetching ...

Subgradient Methods for Nonsmooth Convex Functions with Adversarial Errors

Martijn Gösgens, Bart P. G. Van Parys

TL;DR

It is shown that the classical averaged subgradient descent method, which is optimal in the noiseless case, has worst-case performance that deteriorates quadratically with the corruption budget, and a novel lower bound on the worst-case suboptimality gap of any first-order method satisfying a mild cone condition is proposed.

Abstract

We consider minimizing nonsmooth convex functions with bounded subgradients. However, instead of directly observing a subgradient at every step $k\in [0, \dots, N-1]$, we assume that the optimizer receives an adversarially corrupted subgradient. The adversary's power is limited to a finite corruption budget, but allows the adversary to strategically time its perturbations. We show that the classical averaged subgradient descent method, which is optimal in the noiseless case, has worst-case performance that deteriorates quadratically with the corruption budget. Using performance optimization programming, (i) we construct and analyze the performance of three novel subgradient descent methods, and (ii) propose a novel lower bound on the worst-case suboptimality gap of any first-order method satisfying a mild cone condition proposed by Fatkhullin et al. (2025). The worst-case performance of each of our methods degrades only linearly with the corruption budget. Furthermore, we show that the relative difference between their worst-case suboptimality gap and our lower bound decays as $\mathcal O(\log(N)/N)$, so that all three proposed subgradient descent methods are near-optimal. Our methods achieve such near-optimal performance without a need for momentum or averaging. This suggests that these techniques are not necessary in this context, which is in line with recent results by Zamani and Glineur (2025).

Subgradient Methods for Nonsmooth Convex Functions with Adversarial Errors

TL;DR

It is shown that the classical averaged subgradient descent method, which is optimal in the noiseless case, has worst-case performance that deteriorates quadratically with the corruption budget, and a novel lower bound on the worst-case suboptimality gap of any first-order method satisfying a mild cone condition is proposed.

Abstract

We consider minimizing nonsmooth convex functions with bounded subgradients. However, instead of directly observing a subgradient at every step , we assume that the optimizer receives an adversarially corrupted subgradient. The adversary's power is limited to a finite corruption budget, but allows the adversary to strategically time its perturbations. We show that the classical averaged subgradient descent method, which is optimal in the noiseless case, has worst-case performance that deteriorates quadratically with the corruption budget. Using performance optimization programming, (i) we construct and analyze the performance of three novel subgradient descent methods, and (ii) propose a novel lower bound on the worst-case suboptimality gap of any first-order method satisfying a mild cone condition proposed by Fatkhullin et al. (2025). The worst-case performance of each of our methods degrades only linearly with the corruption budget. Furthermore, we show that the relative difference between their worst-case suboptimality gap and our lower bound decays as , so that all three proposed subgradient descent methods are near-optimal. Our methods achieve such near-optimal performance without a need for momentum or averaging. This suggests that these techniques are not necessary in this context, which is in line with recent results by Zamani and Glineur (2025).

Paper Structure

This paper contains 11 sections, 18 theorems, 187 equations, 2 figures, 1 algorithm.

Key Result

Lemma 1

Let $\gamma\in(0,L\sqrt N)$. For the classical subgradient method with step sizes given in Equation (eq:classical-stepsize), there exists a problem instance where which exceeds the trivial bound $RL$ for $\gamma>2\sqrt2(N+1)^{1/4}L$ and grows unbounded for $\gamma\gg N^{1/4}$.

Figures (2)

  • Figure 1: The conic combination $\alpha^{\mathbb L}$ (reindexed in $\theta\in [0, 1)$) proposed in \ref{['prop:u-L-bound']} for $N=100$ and various noise levels $\sigma.$ The dashed line corresponds to the step sizes $\alpha_k'$ from \ref{['lem:admissible']} for $\sigma=5$.
  • Figure 2: Relative difference between upper and lower worst-case performance bounds as a function of the number of steps $N$. For every $N$, the shown value is the maximum of the relative difference over $\sigma\in[0,\sqrt N]$. Theorem \ref{['thm:squeeze']} implies that this difference decays asymptotically to zero at rate at least $\mathcal{O}(\log(N)/N)$.

Theorems & Definitions (26)

  • Lemma 1
  • Lemma 2: Admissible Subgradient Methods
  • proof
  • Lemma 3
  • proof : Proof of Lemma \ref{['lem:bound']}
  • Proposition 1
  • Corollary 1
  • Proposition 2
  • Lemma 4
  • proof
  • ...and 16 more