Solving Inverse Problems with Deep Linear Neural Networks: Global Convergence Guarantees for Gradient Descent with Weight Decay
Hannah Laus, Suzanna Parkinson, Vasileios Charisopoulos, Felix Krahmer, Rebecca Willett
TL;DR
The paper analyzes underdetermined linear inverse problems solved by deep linear networks trained with gradient descent and weight decay. It proves that such training automatically learns an approximate inverse that respects latent subspace structure, with robust performance on the subspace and controlled off-subspace behavior, under standard RIP-like assumptions and initialization. A three-phase convergence argument shows fast initial reconstruction followed by stabilization and eventual off-subspace convergence, with explicit bounds linking reconstruction and robustness to noise to the regularization strength and network width. The results highlight the regularization/overparameterization tradeoff: weight decay improves robustness and generalization, while deeper networks accelerate convergence, offering a principled explanation for observed empirical benefits and guiding future nonlinear extensions.
Abstract
Machine learning methods are commonly used to solve inverse problems, wherein an unknown signal must be estimated from few indirect measurements generated via a known acquisition procedure. In particular, neural networks perform well empirically but have limited theoretical guarantees. In this work, we study an underdetermined linear inverse problem that admits several possible solution operators that map measurements to estimates of the target signal. A standard remedy (e.g., in compressed sensing) for establishing the uniqueness of the solution mapping is to assume the existence of a latent low-dimensional structure in the source signal. We ask the following question: do deep linear neural networks adapt to unknown low-dimensional structure when trained by gradient descent with weight decay regularization? We prove that mildly overparameterized deep linear networks trained in this manner converge to an approximate solution mapping that accurately solves the inverse problem while implicitly encoding latent subspace structure. We show rigorously that deep linear networks trained with weight decay automatically adapt to latent subspace structure in the data under practical stepsize and weight initialization schemes. Our work highlights that regularization and overparameterization improve generalization, while overparameterization also accelerates convergence during training.
