Table of Contents
Fetching ...

An Improved Analysis of the Clipped Stochastic subGradient Method under Heavy-Tailed Noise

Daniela Angela Parletta, Andrea Paudice, Saverio Salzo

TL;DR

This work develops a clipped stochastic subgradient framework (C-SsGM) for nonsmooth convex optimization under heavy-tailed noise with only the first $p$ moments ($p\in(1,2]$), addressing unbounded domains. It establishes the first convergence rates in expectation for the last iterate and provides near-optimal rates for the last-iterate and average-iterate analyses, including tightened high-probability bounds and epoch-based reductions to relate last-iterate errors to the known average-iterate results. The results extend to kernelized supervised learning, enabling kernel-based implementations with provable risk guarantees even under heavy tails, alongside preliminary numerical experiments validating robustness and performance gains. The paper also discusses practical parameter schedules for both finite-horizon and anytime settings, highlighting improvements over prior work and outlining open questions on removing remaining log factors and achieving a unified analysis for the last iterate. Overall, clipping significantly enhances robustness and convergence in stochastic subgradient methods under heavy-tailed noise, with strong implications for large-scale, nonsmooth optimization and kernel methods.

Abstract

In this paper, we provide novel optimal (or near optimal) convergence rates for a clipped version of the stochastic subgradient method. We consider nonsmooth convex problems over possibly unbounded domains, under heavy-tailed noise that possesses only the first $p$ moments for $p \in \left]1,2\right]$. For the last iterate, we establish convergence in expectation for the objective values with rates of order $(\log^{1/p} k)/k^{(p-1)/p}$ and $1/k^{(p-1)/p}$, for anytime and finite-horizon respectively. We also derive new convergence rates, in expectation and with high probability, for the objective values along the average iterates--improving existing results by a $\log^{(2p-1)/p} k$ factor. Those results are applied to the problem of supervised learning with kernels demonstrating the effectiveness of our theory. Finally, we give preliminary experiments.

An Improved Analysis of the Clipped Stochastic subGradient Method under Heavy-Tailed Noise

TL;DR

This work develops a clipped stochastic subgradient framework (C-SsGM) for nonsmooth convex optimization under heavy-tailed noise with only the first moments (), addressing unbounded domains. It establishes the first convergence rates in expectation for the last iterate and provides near-optimal rates for the last-iterate and average-iterate analyses, including tightened high-probability bounds and epoch-based reductions to relate last-iterate errors to the known average-iterate results. The results extend to kernelized supervised learning, enabling kernel-based implementations with provable risk guarantees even under heavy tails, alongside preliminary numerical experiments validating robustness and performance gains. The paper also discusses practical parameter schedules for both finite-horizon and anytime settings, highlighting improvements over prior work and outlining open questions on removing remaining log factors and achieving a unified analysis for the last iterate. Overall, clipping significantly enhances robustness and convergence in stochastic subgradient methods under heavy-tailed noise, with strong implications for large-scale, nonsmooth optimization and kernel methods.

Abstract

In this paper, we provide novel optimal (or near optimal) convergence rates for a clipped version of the stochastic subgradient method. We consider nonsmooth convex problems over possibly unbounded domains, under heavy-tailed noise that possesses only the first moments for . For the last iterate, we establish convergence in expectation for the objective values with rates of order and , for anytime and finite-horizon respectively. We also derive new convergence rates, in expectation and with high probability, for the objective values along the average iterates--improving existing results by a factor. Those results are applied to the problem of supervised learning with kernels demonstrating the effectiveness of our theory. Finally, we give preliminary experiments.
Paper Structure (28 sections, 23 theorems, 129 equations, 1 figure, 2 tables, 2 algorithms)

This paper contains 28 sections, 23 theorems, 129 equations, 1 figure, 2 tables, 2 algorithms.

Key Result

Lemma 1

The iterates generated by algorithm:clippedSsGM satisfy the following conditions for every $k \in \mathbb{N}$.

Figures (1)

  • Figure 1: Experimental Results. Left panel: comparison of the parameter settings of C-SsgM in the anytime (AT) and the finite horizon (FH) cases. Right panel: comparison between the parameter setting for the average iterate proposed in this work against that proposed in Liu2023b.

Theorems & Definitions (46)

  • Lemma 1
  • Lemma 2
  • Lemma 3: A standard inequality
  • proof
  • Theorem 1
  • proof
  • Remark 1
  • Corollary 1
  • Remark 2
  • Proposition 1: Freedman's inequality
  • ...and 36 more