Table of Contents
Fetching ...

Manipulating Sparse Double Descent

Ya Shi Zhang

TL;DR

This work addresses understanding and manipulating double descent in neural networks via sparsity. It adopts a kernel-learning interpretation where the first layers learn a kernel and the final layer performs linear regression in that kernel space, with sparsity induced by $L_1$ regularization as a convex surrogate for the $L_0$ objective. Empirically, varying the kernel dimension and $L_1$ strength reveals sparse double descent and invariant minima locations with respect to $ abla$alpha. The findings have practical implications for pruning and kernel-based regularization, and motivate validating these phenomena on more complex models and diverse datasets.

Abstract

This paper investigates the double descent phenomenon in two-layer neural networks, focusing on the role of L1 regularization and representation dimensions. It explores an alternative double descent phenomenon, named sparse double descent. The study emphasizes the complex relationship between model complexity, sparsity, and generalization, and suggests further research into more diverse models and datasets. The findings contribute to a deeper understanding of neural network training and optimization.

Manipulating Sparse Double Descent

TL;DR

This work addresses understanding and manipulating double descent in neural networks via sparsity. It adopts a kernel-learning interpretation where the first layers learn a kernel and the final layer performs linear regression in that kernel space, with sparsity induced by regularization as a convex surrogate for the objective. Empirically, varying the kernel dimension and strength reveals sparse double descent and invariant minima locations with respect to alpha. The findings have practical implications for pruning and kernel-based regularization, and motivate validating these phenomena on more complex models and diverse datasets.

Abstract

This paper investigates the double descent phenomenon in two-layer neural networks, focusing on the role of L1 regularization and representation dimensions. It explores an alternative double descent phenomenon, named sparse double descent. The study emphasizes the complex relationship between model complexity, sparsity, and generalization, and suggests further research into more diverse models and datasets. The findings contribute to a deeper understanding of neural network training and optimization.
Paper Structure (6 sections, 1 figure, 1 table)

This paper contains 6 sections, 1 figure, 1 table.

Figures (1)

  • Figure 1: Simple depiction of the double descent phenomenon. $d$ refers to the number of the parameters in the model and $n$ refers to the number of training data points.