Precise asymptotics of reweighted least-squares algorithms for linear diagonal networks
Chiraag Kaushik, Justin Romberg, Vidya Muthukumar
TL;DR
This work develops a precise high-dimensional asymptotic theory for batched, Hadamard-parameterized reweighted least-squares algorithms applied to linear diagonal networks. It uses the Convex Gaussian Min-Max Theorem to characterize the distribution of iterates, establishing a scalar recursion that yields the limiting joint law of the iterates and enabling exact predictions of iteration-wise test error. The framework subsumes alternating minimization, reparameterized IRLS, and lin-RFM, and extends to grouped sparsity, showing that group-aware reweighting aligns learning with the underlying structure and improves error scaling with the number of nonzero groups. These results offer a powerful, predictive tool for comparing LDNN-associated algorithms and demonstrate tangible benefits from exploiting structured sparsity in high dimensions.
Abstract
The classical iteratively reweighted least-squares (IRLS) algorithm aims to recover an unknown signal from linear measurements by performing a sequence of weighted least squares problems, where the weights are recursively updated at each step. Varieties of this algorithm have been shown to achieve favorable empirical performance and theoretical guarantees for sparse recovery and $\ell_p$-norm minimization. Recently, some preliminary connections have also been made between IRLS and certain types of non-convex linear neural network architectures that are observed to exploit low-dimensional structure in high-dimensional linear models. In this work, we provide a unified asymptotic analysis for a family of algorithms that encompasses IRLS, the recently proposed lin-RFM algorithm (which was motivated by feature learning in neural networks), and the alternating minimization algorithm on linear diagonal neural networks. Our analysis operates in a "batched" setting with i.i.d. Gaussian covariates and shows that, with appropriately chosen reweighting policy, the algorithm can achieve favorable performance in only a handful of iterations. We also extend our results to the case of group-sparse recovery and show that leveraging this structure in the reweighting scheme provably improves test error compared to coordinate-wise reweighting.
