Manipulating Sparse Double Descent
Ya Shi Zhang
TL;DR
This work addresses understanding and manipulating double descent in neural networks via sparsity. It adopts a kernel-learning interpretation where the first layers learn a kernel and the final layer performs linear regression in that kernel space, with sparsity induced by $L_1$ regularization as a convex surrogate for the $L_0$ objective. Empirically, varying the kernel dimension and $L_1$ strength reveals sparse double descent and invariant minima locations with respect to $ abla$alpha. The findings have practical implications for pruning and kernel-based regularization, and motivate validating these phenomena on more complex models and diverse datasets.
Abstract
This paper investigates the double descent phenomenon in two-layer neural networks, focusing on the role of L1 regularization and representation dimensions. It explores an alternative double descent phenomenon, named sparse double descent. The study emphasizes the complex relationship between model complexity, sparsity, and generalization, and suggests further research into more diverse models and datasets. The findings contribute to a deeper understanding of neural network training and optimization.
