Table of Contents
Fetching ...

Revisiting Stochastic Proximal Point Methods: Generalized Smoothness and Similarity

Zhirayr Tovmasyan, Grigory Malinovsky, Laurent Condat, Peter Richtárik

TL;DR

This work addresses stochastic nonsmooth optimization by studying the stochastic proximal point method (SPPM) and introducing a generalized $\phi$-smoothness framework that extends beyond standard Lipschitz smoothness. The authors establish convergence guarantees for SPPM under $\phi$-smoothness, including exact and inexact proximal evaluations, and derive sublinear and linear rates in convex and strongly convex settings, respectively. They further extend the analysis to the expected similarity (Star Similarity) setting, deriving convergence in both interpolation and non-interpolation regimes, and provide practical guidance on inner-iteration requirements for the proximal subproblem. Experiments on finite-sum problems validate the theoretical predictions and demonstrate robustness to stepsize choices and initializations. Overall, the paper offers a unifying, general framework for stochastic proximal methods that encompasses existing results as special cases and broadens applicability to real-world ML problems with nonsmooth objectives.

Abstract

The growing prevalence of nonsmooth optimization problems in machine learning has spurred significant interest in generalized smoothness assumptions. Among these, the (L0,L1)-smoothness assumption has emerged as one of the most prominent. While proximal methods are well-suited and effective for nonsmooth problems in deterministic settings, their stochastic counterparts remain underexplored. This work focuses on the stochastic proximal point method (SPPM), valued for its stability and minimal hyperparameter tuning - advantages often missing in stochastic gradient descent (SGD). We propose a novel phi-smoothness framework and provide a comprehensive analysis of SPPM without relying on traditional smoothness assumptions. Our results are highly general, encompassing existing findings as special cases. Furthermore, we examine SPPM under the widely adopted expected similarity assumption, thereby extending its applicability to a broader range of scenarios. Our theoretical contributions are illustrated and validated by practical experiments.

Revisiting Stochastic Proximal Point Methods: Generalized Smoothness and Similarity

TL;DR

This work addresses stochastic nonsmooth optimization by studying the stochastic proximal point method (SPPM) and introducing a generalized -smoothness framework that extends beyond standard Lipschitz smoothness. The authors establish convergence guarantees for SPPM under -smoothness, including exact and inexact proximal evaluations, and derive sublinear and linear rates in convex and strongly convex settings, respectively. They further extend the analysis to the expected similarity (Star Similarity) setting, deriving convergence in both interpolation and non-interpolation regimes, and provide practical guidance on inner-iteration requirements for the proximal subproblem. Experiments on finite-sum problems validate the theoretical predictions and demonstrate robustness to stepsize choices and initializations. Overall, the paper offers a unifying, general framework for stochastic proximal methods that encompasses existing results as special cases and broadens applicability to real-world ML problems with nonsmooth objectives.

Abstract

The growing prevalence of nonsmooth optimization problems in machine learning has spurred significant interest in generalized smoothness assumptions. Among these, the (L0,L1)-smoothness assumption has emerged as one of the most prominent. While proximal methods are well-suited and effective for nonsmooth problems in deterministic settings, their stochastic counterparts remain underexplored. This work focuses on the stochastic proximal point method (SPPM), valued for its stability and minimal hyperparameter tuning - advantages often missing in stochastic gradient descent (SGD). We propose a novel phi-smoothness framework and provide a comprehensive analysis of SPPM without relying on traditional smoothness assumptions. Our results are highly general, encompassing existing findings as special cases. Furthermore, we examine SPPM under the widely adopted expected similarity assumption, thereby extending its applicability to a broader range of scenarios. Our theoretical contributions are illustrated and validated by practical experiments.

Paper Structure

This paper contains 20 sections, 18 theorems, 134 equations, 3 figures, 1 table, 2 algorithms.

Key Result

Lemma 4.2

$\phi$-smoothness (Assumption phi-smoothness) implies that, for $\mathcal{D}$-almost every sample $\xi$,

Figures (3)

  • Figure 1: Convergence behavior of SPPM-inexact with different stepsizes.
  • Figure 2: Convergence behavior of SPPM-inexact with different starting points.
  • Figure 3: Convergence behavior of SPPM-inexact with different inner iterations in strongly convex and convex settings.

Theorems & Definitions (25)

  • Lemma 4.2
  • Lemma 4.3
  • Lemma 4.4
  • Lemma 5.1
  • Lemma 5.2
  • Theorem 5.3
  • Theorem 5.4
  • Theorem 6.1
  • Theorem 6.2
  • Theorem 6.4
  • ...and 15 more