Some Primal-Dual Theory for Subgradient Methods for Strongly Convex Optimization

Benjamin Grimmer; Danlin Li

Some Primal-Dual Theory for Subgradient Methods for Strongly Convex Optimization

Benjamin Grimmer, Danlin Li

TL;DR

The paper advances convex optimization by establishing primal–dual equivalences between classic subgradient methods and dual averaging in strongly convex, possibly non-Lipschitz settings with additive regularizers and functional constraints. It derives a unified $O(1/T)$ convergence bound that couples primal and dual gaps with distance to the optimum, and shows how dual certificates enable computable stopping criteria at no extra computational cost. The framework handles stochastic and switching subgradient variants, extends to non-Lipschitz growth, and reveals how non-Lipschitz terms impact convergence through the constants $L_0$, $L_1$, and $C_0$, while still ensuring eventual convergence. Numerically, the results confirm practical effectiveness of the primal–dual bounds and stopping rules, and illustrate how step-size and weight choices shape performance, including strategies to mitigate early divergences. Overall, the work provides a principled, versatile lens for analyzing and stopping subgradient-type methods in a broad, strongly convex setting.

Abstract

We consider (stochastic) subgradient methods for strongly convex but potentially nonsmooth non-Lipschitz optimization. We provide new equivalent dual descriptions (in the style of dual averaging) for the classic subgradient method, the proximal subgradient method, and the switching subgradient method. These equivalences enable $O(1/T)$ convergence guarantees in terms of both their classic primal gap and a not previously analyzed dual gap for strongly convex optimization. Consequently, our theory provides these classic methods with simple, optimal stopping criteria and optimality certificates at no added computational cost. Our results apply to a wide range of stepsize selections and of non-Lipschitz ill-conditioned problems where the early iterations of the subgradient method may diverge exponentially quickly (a phenomenon which, to the best of our knowledge, no prior works address). Even in the presence of such undesirable behaviors, our theory still ensures and bounds eventual convergence.

Some Primal-Dual Theory for Subgradient Methods for Strongly Convex Optimization

TL;DR

convergence bound that couples primal and dual gaps with distance to the optimum, and shows how dual certificates enable computable stopping criteria at no extra computational cost. The framework handles stochastic and switching subgradient variants, extends to non-Lipschitz growth, and reveals how non-Lipschitz terms impact convergence through the constants

, and

, while still ensuring eventual convergence. Numerically, the results confirm practical effectiveness of the primal–dual bounds and stopping rules, and illustrate how step-size and weight choices shape performance, including strategies to mitigate early divergences. Overall, the work provides a principled, versatile lens for analyzing and stopping subgradient-type methods in a broad, strongly convex setting.

Abstract

convergence guarantees in terms of both their classic primal gap and a not previously analyzed dual gap for strongly convex optimization. Consequently, our theory provides these classic methods with simple, optimal stopping criteria and optimality certificates at no added computational cost. Our results apply to a wide range of stepsize selections and of non-Lipschitz ill-conditioned problems where the early iterations of the subgradient method may diverge exponentially quickly (a phenomenon which, to the best of our knowledge, no prior works address). Even in the presence of such undesirable behaviors, our theory still ensures and bounds eventual convergence.

Paper Structure (16 sections, 10 theorems, 76 equations, 3 figures, 2 tables)

This paper contains 16 sections, 10 theorems, 76 equations, 3 figures, 2 tables.

Introduction
Our Contributions
Preliminaries and Algorithm Definitions
Related Work
Assumptions for our Convergence Theory
Primal-Dual Equivalence and Convergence Analysis
Statement of Primal-Dual Convergence Guarantees
Proof of Primal-Dual Convergence Guarantees
Proof of Proposition \ref{['prop:slater-ratio']}
Proof of Theorem \ref{['thm:super-rate']}
Numerical Experiments
Performance under Varied Stepsize Selections
High Accuracy of Primal-Dual Stopping Criteria
Accuracy of $C_0$ at Predicting Early Iterate Divergence
Acknowledgements.
...and 1 more sections

Key Result

Lemma 2.1

$y_{k+1}$ is the unique minimizer of $M^{(k)}(y)+\frac{\beta_k}{2}\|y-y_0\|_2^2$.

Figures (3)

Figure 1: Bounds and observed performance for different $\lambda_k$ with $\bar{\beta}=0$.
Figure 2: Observed performance for various $\sigma$ with $\alpha_k = 2/\mu(k+2)$.
Figure 3: Observed performance for various $\sigma$ with $\alpha_0=1/\mu$, $\alpha_k= \min\{1/L_1, 2/\mu(k+2)\}$, for $k>0$, with corresponding $\lambda_0 = 1$, $\lambda_k=\frac{\alpha_{k}}{1-\mu\alpha_k}\frac{\lambda_{k-1}}{\alpha_{k-1}}$ and well-controlled $T_0=0$.

Theorems & Definitions (23)

Lemma 2.1
proof
Lemma 2.2
Lemma 2.3
Lemma 2.4
Theorem 3.1
proof
Remark 1
Remark 2
Proposition 3.1
...and 13 more

Some Primal-Dual Theory for Subgradient Methods for Strongly Convex Optimization

TL;DR

Abstract

Some Primal-Dual Theory for Subgradient Methods for Strongly Convex Optimization

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (3)

Theorems & Definitions (23)