Extrapolated Hard Thresholding Algorithms with Finite Length for Composite $\ell_0$ Penalized Problems
Fan Wu, Jiazhen Wei, Wei Bian
TL;DR
The paper tackles sparse optimization with a composite $\ell_0$ penalty involving the Heaviside function by formulating $F(x)=f(x)+\lambda_1\|x_+\|_0+\lambda_2\|x_-\|_0$ over a box constraint set $\Omega$, where $f$ is convex with a Lipschitz gradient. It develops an extrapolated hard-thresholding algorithm that discretizes an inertial gradient system augmented with dry friction and Hessian-driven damping, and proves a finite-length trajectory $\sum_k\|x^{k+1}-x^k\|<\infty$ for $\epsilon>0$ without using the Kurdyka-Łojasiewicz property, with convergence to an $\epsilon$-local minimizer; for $\epsilon=0$, accumulation points are local minimizers. The authors provide equivalent local-minimizer characterizations for both $\lambda_2>0$ and $\lambda_2=0$, analyze perturbations and errors showing robustness, and validate the method through numerical experiments that demonstrate improved speed and sparsity recovery compared to existing approaches. The work offers a scalable, robust framework for nonconvex, nonsmooth $\ell_0$-penalized problems with practical relevance in sparse modeling.
Abstract
For a class of sparse optimization problems with the penalty function of $\|(\cdot)_+\|_0$, we first characterize its local minimizers and then propose an extrapolated hard thresholding algorithm to solve such problems. We show that the iterates generated by the proposed algorithm with $ε>0$ (where $ε$ is the dry friction coefficient) have finite length, without relying on the Kurdyka-Łojasiewicz inequality. Furthermore, we demonstrate that the algorithm converges to an $ε$-local minimizer of this problem. For the special case that $ε=0$, we establish that any accumulation point of the iterates is a local minimizer of the problem. Additionally, we analyze the convergence when an error term is present in the algorithm, showing that the algorithm still converges in the same manner as before, provided that the errors asymptotically approach zero. Finally, we conduct numerical experiments to verify the theoretical results of the proposed algorithm.
