A Unified Theory of Stochastic Proximal Point Methods without Smoothness

Peter Richtárik; Abdurakhmon Sadiev; Yury Demidovich

A Unified Theory of Stochastic Proximal Point Methods without Smoothness

Peter Richtárik, Abdurakhmon Sadiev, Yury Demidovich

TL;DR

This work provides a unified, smoothness-free theory for stochastic proximal point methods (SPPM) by introducing a universal SPPM-LC algorithm with learned corrections. A parametric sigma_k^2 framework yields a single linear convergence theorem that covers standard SPPM, variance-reduced variants, and new algorithms under μ-strong convexity. The analysis recovers best-known rates for existing methods, introduces five novel variants, and demonstrates their practical behavior through numerical experiments. The framework offers a robust, tuning-insensitive approach to stochastic optimization with proximal updates, and sets the stage for extensions to distributed, compressed, or nonconvex settings.

Abstract

This paper presents a comprehensive analysis of a broad range of variations of the stochastic proximal point method (SPPM). Proximal point methods have attracted considerable interest owing to their numerical stability and robustness against imperfect tuning, a trait not shared by the dominant stochastic gradient descent (SGD) algorithm. A framework of assumptions that we introduce encompasses methods employing techniques such as variance reduction and arbitrary sampling. A cornerstone of our general theoretical approach is a parametric assumption on the iterates, correction and control vectors. We establish a single theorem that ensures linear convergence under this assumption and the $μ$-strong convexity of the loss function, and without the need to invoke smoothness. This integral theorem reinstates best known complexity and convergence guarantees for several existing methods which demonstrates the robustness of our approach. We expand our study by developing three new variants of SPPM, and through numerical experiments we elucidate various properties inherent to them.

A Unified Theory of Stochastic Proximal Point Methods without Smoothness

TL;DR

Abstract

-strong convexity of the loss function, and without the need to invoke smoothness. This integral theorem reinstates best known complexity and convergence guarantees for several existing methods which demonstrates the robustness of our approach. We expand our study by developing three new variants of SPPM, and through numerical experiments we elucidate various properties inherent to them.

Paper Structure (30 sections, 22 theorems, 173 equations, 4 figures, 2 tables, 8 algorithms)

This paper contains 30 sections, 22 theorems, 173 equations, 4 figures, 2 tables, 8 algorithms.

Introduction
Variations of SPPM
Contributions
Main result
Key assumption
Main theorem
Overview of specific methods and of the framework
A brief overview
Parameters of the framework
Five novel algorithms
Experiments
Further discussion
Extended literature overview
Special cases
Stochastic proximal point method with learned correction (SPPM-LC)
...and 15 more sections

Key Result

Theorem 1

Let ass:diff (differentiability) and ass:strong ($\mu$-strong convexity) hold. Let $\{x_k,h_k\}$ be the iterates produced by SPPM-LC (alg:SPPM-LC), and assume that they satisfy as:sigma_k_assumption ($\sigma_k^2$-assumption). Choose any $\gamma >0$ and $\alpha>0$ satisfying the inequalities and define the Lyapunov function Then for all iterates $k \geq 0$ of SPPM-LC we have where the parameters

Figures (4)

Figure 1: Comparison of the performance of SPPM-US, SPPM-IS, SPPM-VS and SPPM-AS with $\tau=9$-nice sampling for different selections of stepsize $\gamma \in \{10^{-4}, 10^{-2}, 1, 10^{2}\}$.
Figure 2: Comparison of the performance of SPPM-AS with $\tau$-nice sampling with different selections of cardinality $\tau \in \{1,2,5,9, n =10 \}$ and stepsize $\gamma \in \{10^{-2}, 10^{-1}, 1\}$.
Figure 3: Comparison of the performance of SPPM-US , SPPM-GC and SPPM-star with different selections of stepsize $\gamma \in \{10^{-2}, 1, 10^2\}.$
Figure 4: Comparison of the performance of SPPM-GC , Point-SAGA , L-SVRP with different selection of probabilities $\gamma \in \{1/n = 10^{-3}, 5\cdot10^{-3}, 10^2, 5\cdot10^2, 10^{-1}, 1\}$ . The stepsizes are taken according to the theory.

Theorems & Definitions (45)

Theorem 1
Lemma 1
proof : Proof of Lemma \ref{['lemma_abc']}
proof : Proof of Theorem \ref{['thm:main_theorem_sigma_k']}.
Lemma 2: SPPM
proof : Proof of Lemma \ref{['lemma_abc_sppm']}.
Theorem 2
proof : Proof of \ref{['thm:SPPM']}
Lemma 3: SPPM-NS
proof
...and 35 more

A Unified Theory of Stochastic Proximal Point Methods without Smoothness

TL;DR

Abstract

A Unified Theory of Stochastic Proximal Point Methods without Smoothness

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (4)

Theorems & Definitions (45)