Table of Contents
Fetching ...

Geometry-dependent matching pursuit: a transition phase for convergence on linear regression and LASSO

Céline Moucer, Adrien Taylor, Francis Bach

TL;DR

This work proposes a principled approach to generating (regularized) matching pursuit algorithms adapted to the geometry of the problem at hand, and derives approximate convergence guarantees and describes a transition phenomenon in the convergence of (regularized) matching pursuit from underparametrized to overparametrized models.

Abstract

Greedy first-order methods, such as coordinate descent with Gauss-Southwell rule or matching pursuit, have become popular in optimization due to their natural tendency to propose sparse solutions and their refined convergence guarantees. In this work, we propose a principled approach to generating (regularized) matching pursuit algorithms adapted to the geometry of the problem at hand, as well as their convergence guarantees. Building on these results, we derive approximate convergence guarantees and describe a transition phenomenon in the convergence of (regularized) matching pursuit from underparametrized to overparametrized models.

Geometry-dependent matching pursuit: a transition phase for convergence on linear regression and LASSO

TL;DR

This work proposes a principled approach to generating (regularized) matching pursuit algorithms adapted to the geometry of the problem at hand, and derives approximate convergence guarantees and describes a transition phenomenon in the convergence of (regularized) matching pursuit from underparametrized to overparametrized models.

Abstract

Greedy first-order methods, such as coordinate descent with Gauss-Southwell rule or matching pursuit, have become popular in optimization due to their natural tendency to propose sparse solutions and their refined convergence guarantees. In this work, we propose a principled approach to generating (regularized) matching pursuit algorithms adapted to the geometry of the problem at hand, as well as their convergence guarantees. Building on these results, we derive approximate convergence guarantees and describe a transition phenomenon in the convergence of (regularized) matching pursuit from underparametrized to overparametrized models.
Paper Structure (37 sections, 29 theorems, 79 equations, 9 figures, 1 table, 1 algorithm)

This paper contains 37 sections, 29 theorems, 79 equations, 9 figures, 1 table, 1 algorithm.

Key Result

Lemma 1

Let $F = \frac{1}{n}\|P\alpha - y\|_2^2$, where $P \in \mathbb{R}^{n \times d}$. Then, $F$ is $\mu^F$-strongly convex with respect to a norm $\|\cdot\|$ with,

Figures (9)

  • Figure 1: $\epsilon$-curve for gradient descent (top) and coordinate descent with the GS-rule (bottom), for the three models: synthetic quadratics (on the left) with $n=50$, the leukemia dataset (in the middle) with $n=72$, a random feature model (on the right) with $n=72$.
  • Figure 2: Convergence in function value for gradient descent and coordinate descent with GS rule, on synthetic quadratics $(n=20)$ and on the leukemia dataset ($n=72$), for different values for $d$. Dashed lines: comparison to the approximate convergence guarantees from Corllary \ref{['under_over_corollary']} for synthetic quadratics, and to high probability estimates for the leukemia dataset from Proposition \ref{['big_theo_approx']}.
  • Figure 3: Convergence in function value of the proximal gradient descent, coordinate descent with Gauss-Soutwhell rule and with $L=L_1^F$ (instead of $L=L_2^F$) and of the regularized matching pursuit, for synthetic quadratics (see Section \ref{['sec:random_features']}) with $n=50$, $s=8$, $\lambda = 0.001$, $\sigma = 0.5$ and for $d=30$ starting from zero on the left (underparametrized regime), from a non zero point in the middle (underparametrized regime) and for $d=500$ on the right (overparametrized regime). RMP and coordinate descent with GS-rule matche exactly in these examples.
  • Figure 4: Convergence in function values for the proximal gradient on the left and the regularized matching pursuit on the right for $n = 50$, $d=500$ and a sparsity $s=8$ and for several penalty $\lambda$. Convergence is compared in dashed lines to local convergence guarantee, taken on the support $S$ on the last iterates and the SDP relaxation from Proposition \ref{['Exact_approx_GS']}.
  • Figure 5: $\epsilon$-curve of the proximal gradient, coordinate descent with the GS rule and regularized matching pursuit for a LASSO problem with $d=500$, $n=50$, a sparsity level $s=8$, $\sigma= 0.5$, after $k=10 000$ iterations for several values of $\lambda$.
  • ...and 4 more figures

Theorems & Definitions (33)

  • Lemma 1
  • Lemma 2
  • Lemma 3
  • Proposition 4
  • Theorem 5: Limits of extreme eigenvalues - Theorem 5.11 Bai2010
  • Corollary 6
  • Proposition 7
  • Proposition 8
  • Proposition 9
  • Corollary 10
  • ...and 23 more