Table of Contents
Fetching ...

Straight-Through meets Sparse Recovery: the Support Exploration Algorithm

Mimoun Mohamed, François Malgouyres, Valentin Emiya, Caroline Chaux

TL;DR

This work repurposes the straight-through estimator (STE) from quantized neural networks to the sparse support recovery problem, formulating a sparsification-based objective $F(H(\mathcal{X}))$ and introducing the Support Exploration Algorithm (SEA). SEA maintains a dense exploration vector $\mathcal{X}$, selects a $k$-sparse support via $S^t=\text{largest}_k(\mathcal{X}^t)$, and updates $\mathcal{X}$ through an STE-inspired gradient, enabling broader exploration of candidate supports than traditional greedy methods. The authors establish RIP-based recovery guarantees (Recovery-RIP and related corollaries) showing that SEA can recover the true support under certain incoherence/noise conditions, and they demonstrate substantial empirical gains in coherent settings (e.g., spike deconvolution) where standard methods falter. The results highlight SEA’s potential as both a standalone sparse-recovery method and a post-processing step to improve existing solvers, with practical implications for real-world inverse problems and potential extensions to neural-network sparsification contexts.

Abstract

The {\it straight-through estimator} (STE) is commonly used to optimize quantized neural networks, yet its contexts of effective performance are still unclear despite empirical successes.To make a step forward in this comprehension, we apply STE to a well-understood problem: {\it sparse support recovery}. We introduce the {\it Support Exploration Algorithm} (SEA), a novel algorithm promoting sparsity, and we analyze its performance in support recovery (a.k.a. model selection) problems. SEA explores more supports than the state-of-the-art, leading to superior performance in experiments, especially when the columns of $A$ are strongly coherent.The theoretical analysis considers recovery guarantees when the linear measurements matrix $A$ satisfies the {\it Restricted Isometry Property} (RIP).The sufficient conditions of recovery are comparable but more stringent than those of the state-of-the-art in sparse support recovery. Their significance lies mainly in their applicability to an instance of the STE.

Straight-Through meets Sparse Recovery: the Support Exploration Algorithm

TL;DR

This work repurposes the straight-through estimator (STE) from quantized neural networks to the sparse support recovery problem, formulating a sparsification-based objective and introducing the Support Exploration Algorithm (SEA). SEA maintains a dense exploration vector , selects a -sparse support via , and updates through an STE-inspired gradient, enabling broader exploration of candidate supports than traditional greedy methods. The authors establish RIP-based recovery guarantees (Recovery-RIP and related corollaries) showing that SEA can recover the true support under certain incoherence/noise conditions, and they demonstrate substantial empirical gains in coherent settings (e.g., spike deconvolution) where standard methods falter. The results highlight SEA’s potential as both a standalone sparse-recovery method and a post-processing step to improve existing solvers, with practical implications for real-world inverse problems and potential extensions to neural-network sparsification contexts.

Abstract

The {\it straight-through estimator} (STE) is commonly used to optimize quantized neural networks, yet its contexts of effective performance are still unclear despite empirical successes.To make a step forward in this comprehension, we apply STE to a well-understood problem: {\it sparse support recovery}. We introduce the {\it Support Exploration Algorithm} (SEA), a novel algorithm promoting sparsity, and we analyze its performance in support recovery (a.k.a. model selection) problems. SEA explores more supports than the state-of-the-art, leading to superior performance in experiments, especially when the columns of are strongly coherent.The theoretical analysis considers recovery guarantees when the linear measurements matrix satisfies the {\it Restricted Isometry Property} (RIP).The sufficient conditions of recovery are comparable but more stringent than those of the state-of-the-art in sparse support recovery. Their significance lies mainly in their applicability to an instance of the STE.
Paper Structure (59 sections, 14 theorems, 103 equations, 36 figures, 6 algorithms)

This paper contains 59 sections, 14 theorems, 103 equations, 36 figures, 6 algorithms.

Key Result

Theorem 4.1

Assume $A$ satisfies the $(2k+1)$-RIP andThe normalization aims at simplifying formulas by guaranteeing that $\delta_1=0$. It is done at no expense since, if $A$ is not normalized but satisfies eq:RIP for $l>1$, its normalization only has a small impact on $\delta_l$. Indeed, considering $\Delta\in\ If moreover, $x^*$ is such that and SEA performs more than $T_{ RIP}$ iterations, then $S^* \subse

Figures (36)

  • Figure 1: Overview of the main results. Left: phase transition diagram showing the recovery limits in dimension $n=500$ while sparsity $k$ and number of observations $m$ varies (the higher, the better, see details in Section \ref{['dt-sec']}). Right: spike deconvolution in dimension $m=n=500$ - Average distance between the supports of the solution $x^*$ and the estimations obtained from various algorithms, plotted against the sparsity level $k$ (the lower, the better, see details in Section \ref{['deconv-sec']}).
  • Figure 2: Phase transition diagram: each curve is the threshold below which the related algorithm recovers at least $95\%$ of the supports. $\zeta$ denotes the ratio between the number of rows and the number of columns in $A$ while $\rho$ denotes the ratio between the sparsity and the number of rows in $A$. Matrix $A$ have i.i.d. standard Gaussian entries and non-zero entries in $x^*$ are drawn uniformly in $[-2, -1]\cup[1, 2]$. $n=500$ is fixed and results are obtained from $1000$ runs.
  • Figure 3: Spike deconvolution: representation of an instance of $x^*$ and $y$ with the solutions provided by the algorithms when $k = 20$. This is a cropped version of a crowded area (spikes are close).
  • Figure 4: Spike deconvolution: average support distance between $S^*$ and the support of the solutions provided by several algorithms as a function of the sparsity level $k$.
  • Figure 5: Visual representation of the main sets of indices encountered in the article.
  • ...and 31 more figures

Theorems & Definitions (23)

  • Theorem 4.1: Recovery - RIP case
  • Corollary 4.2: Noiseless recovery - simplified RIP case
  • Proposition A.1: Optimization problem equivalence
  • proof
  • Theorem C.1: Recovery - Oracle Update Rule
  • Lemma C.2
  • proof
  • Lemma C.3
  • proof
  • Theorem C.4: Recovery - General case
  • ...and 13 more