Alternating Iteratively Reweighted $\ell_1$ and Subspace Newton Algorithms for Nonconvex Sparse Optimization
Hao Wang, Xiangyu Yang, Yichen Zhu
TL;DR
The paper tackles nonconvex sparse optimization by formulating $\min_x f(x)+\lambda h(x)$ with $h(x)=\sum_i (r\circ|\cdot|)(x_i)$ and proposing IReNA, a hybrid algorithm that alternates subspace iteratively reweighted $\ell_1$ steps with subspace Newton updates. It introduces two optimality residuals to identify and exploit a relevant subspace, guaranteeing closed-form subproblem solutions via soft-thresholding and accelerating convergence through Newton steps on the active set. The authors prove global convergence to a critical point, establish local convergence rates under the KL property, and show quadratic convergence when exact Newton steps are used; they also discuss a trust-region variant with a similar local complexity bound. Numerical experiments on logistic regression with various nonconvex regularizers and real datasets demonstrate improved efficiency and high-quality sparse solutions compared to state-of-the-art hybrids. Overall, IReNA offers a scalable, theoretically sound framework for a broad class of nonconvex sparse regularizers with strong practical performance.
Abstract
This paper presents a novel hybrid algorithm for minimizing the sum of a continuously differentiable loss function and a nonsmooth, possibly nonconvex, sparse regularization function. The proposed method alternates between solving a reweighted $\ell_1$-regularized subproblem and performing an inexact subspace Newton step. The reweighted $\ell_1$-subproblem allows for efficient closed-form solutions via the soft-thresholding operator, avoiding the computational overhead of proximity operator calculations. As the algorithm approaches an optimal solution, it maintains a stable support set, ensuring that nonzero components stay uniformly bounded away from zero. It then switches to a perturbed regularized Newton method, further accelerating the convergence. We prove global convergence to a critical point and, under suitable conditions, demonstrate that the algorithm exhibits local linear and quadratic convergence rates. Numerical experiments show that our algorithm outperforms existing methods in both efficiency and solution quality across various model prediction problems.
