A Bregman Proximal Stochastic Gradient Method with Extrapolation for Nonconvex Nonsmooth Problems

Qingsong Wang; Zehui Liu; Chunfeng Cui; Deren Han

A Bregman Proximal Stochastic Gradient Method with Extrapolation for Nonconvex Nonsmooth Problems

Qingsong Wang, Zehui Liu, Chunfeng Cui, Deren Han

TL;DR

The paper tackles nonconvex nonsmooth optimization where the differentiable part $f$ lacks a global Lipschitz gradient by introducing a Bregman proximal stochastic gradient method with extrapolation (BPSGE) under a variance-reduction framework. It establishes subsequential and global convergence, a sublinear rate for the subsequence, and an $\mathcal{O}(\varepsilon^{-2})$ complexity to obtain an $\varepsilon$-stationary point, leveraging the Kurdyka-Łojasiewicz property for global convergence. The authors demonstrate the method on three real-world NMF-related problems—graph-regularized NMF, weakly-convex regularized MF, and NMF with nonconvex sparsity constraints—and show that extrapolation accelerates convergence compared to baselines without extrapolation, especially when combined with variance-reduced estimators like SAGA and SARAH. The work contributes a scalable optimization framework for large-scale, non-Lipschitz-smooth, nonconvex nonsmooth problems with practical impact in clustering and matrix factorization tasks. Overall, BPSGE provides both theoretical guarantees and practical acceleration for challenging nonconvex nonsmooth optimization settings.

Abstract

In this paper, we explore a specific optimization problem that involves the combination of a differentiable nonconvex function and a nondifferentiable function. The differentiable component lacks a global Lipschitz continuous gradient, posing challenges for optimization. To address this issue and accelerate the convergence, we propose a Bregman proximal stochastic gradient method with extrapolation (BPSGE), which only requires smooth adaptivity of the differentiable part. Under the variance reduction framework, we not only analyze the subsequential and global convergence of the proposed algorithm under certain conditions, but also analyze the sublinear convergence rate of the subsequence, and the complexity of the algorithm, revealing that the BPSGE algorithm requires at most O(epsilon\^\,(-2)) iterations in expectation to attain an epsilon-stationary point. To validate the effectiveness of our proposed algorithm, we conduct numerical experiments on three real-world applications: graph regularized nonnegative matrix factorization (NMF), matrix factorization with weakly-convex regularization, and NMF with nonconvex sparsity constraints. These experiments demonstrate that BPSGE is faster than the baselines without extrapolation.

A Bregman Proximal Stochastic Gradient Method with Extrapolation for Nonconvex Nonsmooth Problems

TL;DR

The paper tackles nonconvex nonsmooth optimization where the differentiable part

lacks a global Lipschitz gradient by introducing a Bregman proximal stochastic gradient method with extrapolation (BPSGE) under a variance-reduction framework. It establishes subsequential and global convergence, a sublinear rate for the subsequence, and an

complexity to obtain an

-stationary point, leveraging the Kurdyka-Łojasiewicz property for global convergence. The authors demonstrate the method on three real-world NMF-related problems—graph-regularized NMF, weakly-convex regularized MF, and NMF with nonconvex sparsity constraints—and show that extrapolation accelerates convergence compared to baselines without extrapolation, especially when combined with variance-reduced estimators like SAGA and SARAH. The work contributes a scalable optimization framework for large-scale, non-Lipschitz-smooth, nonconvex nonsmooth problems with practical impact in clustering and matrix factorization tasks. Overall, BPSGE provides both theoretical guarantees and practical acceleration for challenging nonconvex nonsmooth optimization settings.

Abstract

Paper Structure (25 sections, 14 theorems, 59 equations, 3 figures, 3 tables, 1 algorithm)

This paper contains 25 sections, 14 theorems, 59 equations, 3 figures, 3 tables, 1 algorithm.

Introduction
Preliminary
Algorithm
Convergence Analysis
Subsequential Convergence Analysis
Global Convergence Analysis
Numerical Experiments
Graph Regularized NMF for Clustering
MF with Weakly-convex Regularization
NMF with Nonconvex Sparsity Constraints
Conclusion
Acknowledgments
Mathematical Proofs
Proof of Lemma \ref{['lemma_Phi_kk1']}
Proof of Lemma \ref{['lyapunov_descent']}
...and 10 more sections

Key Result

Lemma 1

Suppose Assumptions assume_01-assume_02 are satisfied and $\tilde{\nabla}f(\bar{x}_{k})$ satisfies the variance reduction property defined by Definition vr_definition. Let $\{x_{k}\}$ be the sequence generated by Algorithm BPSGE. Then the following inequality holds for any $k>0$, Here, $\gamma=\sqrt{2(V_{\Gamma}/\tau+V_{1})}$, $\alpha$ is the weakly convex parameter in Assumption assume_02, $\del

Figures (3)

Figure 1: Numerical experiment results on ORL and Yale-B datasets for problem \ref{['WCMF']}. Left: ORL with $r=25$. Right: Yale-B with $r=49$.
Figure 2: Numerical experiments for MF problem \ref{['SSNMF']}.
Figure 3: The basis images generated by solving the nonconvex sparsity constrained NMF problem \ref{['SSNMF']} with $s_{1}=\frac{m}{3}$ and $s_{2}=\frac{d}{2}$.

Theorems & Definitions (31)

Definition 1
Definition 2
Definition 3
Remark 1
Definition 4
Remark 2
Lemma 1
Lemma 2
Theorem 1
Proposition 1
...and 21 more

A Bregman Proximal Stochastic Gradient Method with Extrapolation for Nonconvex Nonsmooth Problems

TL;DR

Abstract

A Bregman Proximal Stochastic Gradient Method with Extrapolation for Nonconvex Nonsmooth Problems

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (3)

Theorems & Definitions (31)