Diagonal Linear Networks and the Lasso Regularization Path
Raphaël Berthier
TL;DR
This work reveals a deep dynamical link between training diagonal linear networks (DLNs) and the classical lasso path. By analyzing gradient flow in the small-initialization regime, the authors show that the time-rescaled, averaged DLN trajectory tracks the lasso regularization path, with the rescaled time playing the role of the inverse regularization parameter. They establish exact connections under a monotonicity assumption on the lasso path and provide quantified approximate connections otherwise, through a robust framework that uses mirror-flow interpretation and linear complementarity problems. The analysis covers both the uv and the u^2 parametrizations, employs systematic reductions from uv to u^2, and is complemented by simulations illustrating the sparsity-data-fit trade-off of early stopping. The results contribute a dynamical, path-following perspective to implicit regularization in DLNs and offer a blueprint for extending such analyses to more complex network architectures via similar reductions.
Abstract
Diagonal linear networks are neural networks with linear activation and diagonal weight matrices. Their theoretical interest is that their implicit regularization can be rigorously analyzed: from a small initialization, the training of diagonal linear networks converges to the linear predictor with minimal 1-norm among minimizers of the training loss. In this paper, we deepen this analysis showing that the full training trajectory of diagonal linear networks is closely related to the lasso regularization path. In this connection, the training time plays the role of an inverse regularization parameter. Both rigorous results and simulations are provided to illustrate this conclusion. Under a monotonicity assumption on the lasso regularization path, the connection is exact while in the general case, we show an approximate connection.
