Table of Contents
Fetching ...

Integral control of the proximal gradient method for unbiased sparse optimization

V. Cerone, S. M. Fosson, A. Re, D. Regruto

TL;DR

The paper tackles biased solutions in sparse optimization by introducing I-ISTA, an integral-control version of ISTA that uses a feedback law on the regularization parameter to drive the gradient to zero without increasing computational burden. It provides convergence analysis for $\mu$-strongly convex, $\beta$-smooth objectives and validates the approach numerically in both strongly and non-strongly convex regimes, showing unbiased recovery with iteration counts comparable to state-of-the-art gradient methods. The key contribution is a principled control-theoretic design that preserves sparsity while eliminating bias, enabling efficient, unbiased sparse recovery in practical scenarios. This has potential impact for real-time and embedded applications where parsimonious models are essential.

Abstract

Proximal gradient methods are popular in sparse optimization as they are straightforward to implement. Nevertheless, they achieve biased solutions, requiring many iterations to converge. This work addresses these issues through a suitable feedback control of the algorithm's hyperparameter. Specifically, by designing an integral control that does not substantially impact the computational complexity, we can reach an unbiased solution in a reasonable number of iterations. In the paper, we develop and analyze the convergence of the proposed approach for strongly-convex problems. Moreover, numerical simulations validate and extend the theoretical results to the non-strongly convex framework.

Integral control of the proximal gradient method for unbiased sparse optimization

TL;DR

The paper tackles biased solutions in sparse optimization by introducing I-ISTA, an integral-control version of ISTA that uses a feedback law on the regularization parameter to drive the gradient to zero without increasing computational burden. It provides convergence analysis for -strongly convex, -smooth objectives and validates the approach numerically in both strongly and non-strongly convex regimes, showing unbiased recovery with iteration counts comparable to state-of-the-art gradient methods. The key contribution is a principled control-theoretic design that preserves sparsity while eliminating bias, enabling efficient, unbiased sparse recovery in practical scenarios. This has potential impact for real-time and embedded applications where parsimonious models are essential.

Abstract

Proximal gradient methods are popular in sparse optimization as they are straightforward to implement. Nevertheless, they achieve biased solutions, requiring many iterations to converge. This work addresses these issues through a suitable feedback control of the algorithm's hyperparameter. Specifically, by designing an integral control that does not substantially impact the computational complexity, we can reach an unbiased solution in a reasonable number of iterations. In the paper, we develop and analyze the convergence of the proposed approach for strongly-convex problems. Moreover, numerical simulations validate and extend the theoretical results to the non-strongly convex framework.

Paper Structure

This paper contains 9 sections, 2 theorems, 19 equations, 6 figures, 1 table.

Key Result

Lemma 1

Let Assumption ass:mu holds. If $\alpha>|k_i|,$ the equilibrium point $(x^{\star},\lambda^{\star})$ of I-ISTA satisfies $\nabla f(x^{\star})=0$ and $\lambda^{\star}=0$. In particular, $x^{\star}$ is the unique, hence sparse minimizer of $f$.

Figures (6)

  • Figure 1: Example 1: $m=210$. Residual $\|Ax(k)-y\|_2$ with respect to $\|x(k)\|_1$ in a single run. The curves are parametrized with time. "True" refers to the value of $\widetilde{x}$. On the left, we show the overall trajectory; we label iterations 1, 10, 50. On the right, we magnify the figure around $\widetilde{x}$ and report the convergence step. The gradient descent (GRAD) reaches $\widetilde{x}$, but with a number of iterations larger than the set maximum $5\times10^4$.
  • Figure 2: Example 1: $m=210$. Evolution of the relative error $\|x(k)-\widetilde{x}\|_2/\|\widetilde{x}\|_2$ (left) and of the residual $\|Ax(k)-y\|_2$ (right), averaged over 100 runs.
  • Figure 3: Example 1: $m=210$. Evolution of the support error $\sum_{i=1}^n |\mathrm{1}(x_i(k)-\mathrm{1}(\widetilde{x}_i)|$ (left) and of the sparsity level $\|x(k)\|_0$ (right), averaged over 100 runs. The graphs on the support error are interrupted when the error is null.
  • Figure 4: Example 2: $m=150$. Residual $\|Ax(k)-y\|_2$ with respect to $\|x(k)\|_1$ in a single run. The curves are parametrized with time. "True" refers to the value of $\widetilde{x}$. On the left, we show the overall trajectory; we label iterations 1, 10, 50. On the right, we magnify the figure around $\widetilde{x}$ and report the convergence step.
  • Figure 5: Example 2: $m=150$. Evolution of the relative error $\|x(k)-\widetilde{x}\|_2/\|\widetilde{x}\|_2$ (left) and the residual $\|Ax(k)-y\|_2$ (right), averaged over 100 runs.
  • ...and 1 more figures

Theorems & Definitions (5)

  • Lemma 1
  • proof
  • Proposition 1
  • proof
  • Remark 1