Table of Contents
Fetching ...

Precise asymptotics of reweighted least-squares algorithms for linear diagonal networks

Chiraag Kaushik, Justin Romberg, Vidya Muthukumar

TL;DR

This work develops a precise high-dimensional asymptotic theory for batched, Hadamard-parameterized reweighted least-squares algorithms applied to linear diagonal networks. It uses the Convex Gaussian Min-Max Theorem to characterize the distribution of iterates, establishing a scalar recursion that yields the limiting joint law of the iterates and enabling exact predictions of iteration-wise test error. The framework subsumes alternating minimization, reparameterized IRLS, and lin-RFM, and extends to grouped sparsity, showing that group-aware reweighting aligns learning with the underlying structure and improves error scaling with the number of nonzero groups. These results offer a powerful, predictive tool for comparing LDNN-associated algorithms and demonstrate tangible benefits from exploiting structured sparsity in high dimensions.

Abstract

The classical iteratively reweighted least-squares (IRLS) algorithm aims to recover an unknown signal from linear measurements by performing a sequence of weighted least squares problems, where the weights are recursively updated at each step. Varieties of this algorithm have been shown to achieve favorable empirical performance and theoretical guarantees for sparse recovery and $\ell_p$-norm minimization. Recently, some preliminary connections have also been made between IRLS and certain types of non-convex linear neural network architectures that are observed to exploit low-dimensional structure in high-dimensional linear models. In this work, we provide a unified asymptotic analysis for a family of algorithms that encompasses IRLS, the recently proposed lin-RFM algorithm (which was motivated by feature learning in neural networks), and the alternating minimization algorithm on linear diagonal neural networks. Our analysis operates in a "batched" setting with i.i.d. Gaussian covariates and shows that, with appropriately chosen reweighting policy, the algorithm can achieve favorable performance in only a handful of iterations. We also extend our results to the case of group-sparse recovery and show that leveraging this structure in the reweighting scheme provably improves test error compared to coordinate-wise reweighting.

Precise asymptotics of reweighted least-squares algorithms for linear diagonal networks

TL;DR

This work develops a precise high-dimensional asymptotic theory for batched, Hadamard-parameterized reweighted least-squares algorithms applied to linear diagonal networks. It uses the Convex Gaussian Min-Max Theorem to characterize the distribution of iterates, establishing a scalar recursion that yields the limiting joint law of the iterates and enabling exact predictions of iteration-wise test error. The framework subsumes alternating minimization, reparameterized IRLS, and lin-RFM, and extends to grouped sparsity, showing that group-aware reweighting aligns learning with the underlying structure and improves error scaling with the number of nonzero groups. These results offer a powerful, predictive tool for comparing LDNN-associated algorithms and demonstrate tangible benefits from exploiting structured sparsity in high dimensions.

Abstract

The classical iteratively reweighted least-squares (IRLS) algorithm aims to recover an unknown signal from linear measurements by performing a sequence of weighted least squares problems, where the weights are recursively updated at each step. Varieties of this algorithm have been shown to achieve favorable empirical performance and theoretical guarantees for sparse recovery and -norm minimization. Recently, some preliminary connections have also been made between IRLS and certain types of non-convex linear neural network architectures that are observed to exploit low-dimensional structure in high-dimensional linear models. In this work, we provide a unified asymptotic analysis for a family of algorithms that encompasses IRLS, the recently proposed lin-RFM algorithm (which was motivated by feature learning in neural networks), and the alternating minimization algorithm on linear diagonal neural networks. Our analysis operates in a "batched" setting with i.i.d. Gaussian covariates and shows that, with appropriately chosen reweighting policy, the algorithm can achieve favorable performance in only a handful of iterations. We also extend our results to the case of group-sparse recovery and show that leveraging this structure in the reweighting scheme provably improves test error compared to coordinate-wise reweighting.
Paper Structure (17 sections, 7 theorems, 88 equations, 3 figures, 1 table)

This paper contains 17 sections, 7 theorems, 88 equations, 3 figures, 1 table.

Key Result

Theorem 1

Suppose Assumptions assump:init and assump:psi are satisfied. Then, for any $t \geq 0$ and any function $g \colon \mathbb{R}^3 \to \mathbb{R}$ such that $g \in \text{PL}(2)$ or $g$ is bounded and continuous, we have where the expectation is over the independent random variables $(V,\Theta) \sim \Pi_t$ and $G_t \sim \scrN(0,1)$.

Figures (3)

  • Figure 1: Theoretical predictions and simulations of the test error $\frac{1}{d}\lVert\blu \odot \blv - \bltheta^*\rVert_1$ (log scale, pluses denote the median over 100 trials and the shaded region indicates the interquartile range) for two different noise levels, where $n = 250, d=2000$, and $\bltheta^*$ has $\text{Bernoulli}(0.01)$ entries. Here, $\psi = |uv|^{\frac{1}{2}}$ corresponds to the classical IRLS weighting from daubechies2010iteratively, $\psi = \tanh{|uv|}$ is a version of lin-RFM, $\psi = u$ corresponds to AM, and $\psi = \tanh{u^2}$ is a new reweighting scheme we introduce. We note that the $\psi$ which depend only on $\blu$ can lead to oscillatory behavior in the test risk.
  • Figure 2: Group-blind ($\psi_{gb}$) vs. group-aware ($\psi_{ga}$) reweighting when $\bltheta^*$ has group-sparse structure. We set $n=500, d=4000, \sigma=0.1$, and $\bltheta^*_i \overset{\mathclap{\text{i.i.d.}}}{\sim} \text{Bernoulli}(0.01)\mathbf{1}_b$. For each curve, $\lambda$ is set to minimize the asymptotic test error achieved. Simulation results are the median/IQR over 100 trials. Left: Comparison of the test error trajectory (log scale) for a fixed block size $b=8$. Right: $\ell_1$ test error after $T=4$ iterations, for varying group sizes.
  • Figure 3: Here, we fix $n=250, d=2000, \sigma = 0.1, \theta^*_i \overset{\mathclap{\text{i.i.d.}}}{\sim} \text{Bernoulli}(0.01)$ and select $\lambda$ to minimize the predicted asymptotic loss. Plus marks denote the median over 100 trials, and the shaded region indicates the interquartile range. Left: Predictions and simulations for weighting functions which are not uniformly bounded. Right: Predictions and simulations for the squared error $\frac{1}{d}\normt{\blu \odot \blv - \bltheta^*}^2$

Theorems & Definitions (14)

  • Theorem 1
  • Theorem 2
  • Definition 1
  • Theorem 3: Convex Gaussian Min-Max Theorem thrampoulidis2018precise
  • proof : Proof of Theorem \ref{['thm:main']}
  • proof : Proof of Theorem \ref{['thm:group']}
  • Proposition 1
  • proof
  • Lemma 1
  • proof
  • ...and 4 more