Avoiding strict saddle points of nonconvex regularized problems
Luwei Bai, Yaohua Hu, Hao Wang, Xiaoqi Yang
TL;DR
This work analyzes a broad class of nonconvex, nonsmooth regularized problems for sparse optimization and proves that second-order optimality depends only on the nonzero support of stationary points. It introduces two damped iterative reweighted schemes, ${\rm DIRL}_{1}$ and ${\rm DIRL}_{2}$, with convergence guarantees and saddle-point avoidance grounded in a center-stable manifold framework; DIRL$_1$ benefits from active-manifold identification, while DIRL$_2$ achieves differentiable, Lipschitz subproblem mappings under suitable conditions. The core result is that, under the strict saddle property, the fixed-point iterations are Lipeomorphisms whose unstable fixed points correspond to strict saddles, leading to convergence to local minima from random initializations for almost all starts. This avoids relying on stochastic perturbations or active-manifold assumptions, broadening applicability to nonsmooth nonconvex regularizers such as the $\ell_p$ quasi-norm and related penalties. The framework provides a rigorous pathway to escaping saddles in nonsmooth optimization and yields globally convergent, saddle-avoiding algorithms for sparse modeling.
Abstract
In this paper, we consider a class of non-convex and non-smooth sparse optimization problems, which encompass most existing nonconvex sparsity-inducing terms. We show the second-order optimality conditions only depend on the nonzeros of the stationary points. We propose two damped iterative reweighted algorithms including the iteratively reweighted $\ell_1$ algorithm (DIRL$_1$) and the iteratively reweighted $\ell_2$ (DIRL$_2$) algorithm, to solve these problems. For DIRL$_1$, we show the reweighted $\ell_1$ subproblem has support identification property so that DIRL$_1$ locally reverts to a gradient descent algorithm around a stationary point. For DIRL$_2$, we show the solution map of the reweighted $\ell_2$ subproblem is differentiable and Lipschitz continuous everywhere. Therefore, the map of DIRL$_1$ and DIRL$_2$ and their inverse are Lipschitz continuous, and the strict saddle points are their unstable fixed points. By applying the stable manifold theorem, these algorithms are shown to converge only to local minimizers with randomly initialization when the strictly saddle point property is assumed.
