Avoiding strict saddle points of nonconvex regularized problems

Luwei Bai; Yaohua Hu; Hao Wang; Xiaoqi Yang

Avoiding strict saddle points of nonconvex regularized problems

Luwei Bai, Yaohua Hu, Hao Wang, Xiaoqi Yang

TL;DR

This work analyzes a broad class of nonconvex, nonsmooth regularized problems for sparse optimization and proves that second-order optimality depends only on the nonzero support of stationary points. It introduces two damped iterative reweighted schemes, ${\rm DIRL}_{1}$ and ${\rm DIRL}_{2}$, with convergence guarantees and saddle-point avoidance grounded in a center-stable manifold framework; DIRL$_1$ benefits from active-manifold identification, while DIRL$_2$ achieves differentiable, Lipschitz subproblem mappings under suitable conditions. The core result is that, under the strict saddle property, the fixed-point iterations are Lipeomorphisms whose unstable fixed points correspond to strict saddles, leading to convergence to local minima from random initializations for almost all starts. This avoids relying on stochastic perturbations or active-manifold assumptions, broadening applicability to nonsmooth nonconvex regularizers such as the $\ell_p$ quasi-norm and related penalties. The framework provides a rigorous pathway to escaping saddles in nonsmooth optimization and yields globally convergent, saddle-avoiding algorithms for sparse modeling.

Abstract

In this paper, we consider a class of non-convex and non-smooth sparse optimization problems, which encompass most existing nonconvex sparsity-inducing terms. We show the second-order optimality conditions only depend on the nonzeros of the stationary points. We propose two damped iterative reweighted algorithms including the iteratively reweighted $\ell_1$ algorithm (DIRL$_1$) and the iteratively reweighted $\ell_2$ (DIRL$_2$) algorithm, to solve these problems. For DIRL$_1$, we show the reweighted $\ell_1$ subproblem has support identification property so that DIRL$_1$ locally reverts to a gradient descent algorithm around a stationary point. For DIRL$_2$, we show the solution map of the reweighted $\ell_2$ subproblem is differentiable and Lipschitz continuous everywhere. Therefore, the map of DIRL$_1$ and DIRL$_2$ and their inverse are Lipschitz continuous, and the strict saddle points are their unstable fixed points. By applying the stable manifold theorem, these algorithms are shown to converge only to local minimizers with randomly initialization when the strictly saddle point property is assumed.

Avoiding strict saddle points of nonconvex regularized problems

TL;DR

and

, with convergence guarantees and saddle-point avoidance grounded in a center-stable manifold framework; DIRL

benefits from active-manifold identification, while DIRL

achieves differentiable, Lipschitz subproblem mappings under suitable conditions. The core result is that, under the strict saddle property, the fixed-point iterations are Lipeomorphisms whose unstable fixed points correspond to strict saddles, leading to convergence to local minima from random initializations for almost all starts. This avoids relying on stochastic perturbations or active-manifold assumptions, broadening applicability to nonsmooth nonconvex regularizers such as the

quasi-norm and related penalties. The framework provides a rigorous pathway to escaping saddles in nonsmooth optimization and yields globally convergent, saddle-avoiding algorithms for sparse modeling.

Abstract

algorithm (DIRL

) and the iteratively reweighted

(DIRL

) algorithm, to solve these problems. For DIRL

, we show the reweighted

subproblem has support identification property so that DIRL

locally reverts to a gradient descent algorithm around a stationary point. For DIRL

, we show the solution map of the reweighted

subproblem is differentiable and Lipschitz continuous everywhere. Therefore, the map of DIRL

and DIRL

and their inverse are Lipschitz continuous, and the strict saddle points are their unstable fixed points. By applying the stable manifold theorem, these algorithms are shown to converge only to local minimizers with randomly initialization when the strictly saddle point property is assumed.

Paper Structure (15 sections, 22 theorems, 89 equations, 1 table, 2 algorithms)

This paper contains 15 sections, 22 theorems, 89 equations, 1 table, 2 algorithms.

Introduction
The main results
Outline
Notation
Optimality and strict saddle points
Optimality conditions
Strict Saddle Points
Center Stable Manifold Theorem
Damped Iterative Reweighted $\ell_1$ Algorithms
Convergence analysis
Avoidance of strict saddle points
Damped Iterative Reweighted $\ell_2$ Algorithms
Convergence analysis
Avoidance of strictly saddle points
Proof supplementary

Key Result

Proposition 2.2

\newlabelsubgradient0 It holds that $\partial F(x) = \hat{\partial} F(x) = \nabla f(x)+ \lambda \partial \psi(x)$ where and

Theorems & Definitions (45)

Definition 1.1
Definition 2.1
Proposition 2.2
Proof 1
Proposition 2.3
Theorem 2.4
Proof 2
Definition 2.5
Definition 2.6
Theorem 2.7
...and 35 more

Avoiding strict saddle points of nonconvex regularized problems

TL;DR

Abstract

Avoiding strict saddle points of nonconvex regularized problems

Authors

TL;DR

Abstract

Table of Contents

Key Result

Theorems & Definitions (45)