Manipulating Sparse Double Descent

Ya Shi Zhang

Manipulating Sparse Double Descent

Ya Shi Zhang

TL;DR

This work addresses understanding and manipulating double descent in neural networks via sparsity. It adopts a kernel-learning interpretation where the first layers learn a kernel and the final layer performs linear regression in that kernel space, with sparsity induced by $L_1$ regularization as a convex surrogate for the $L_0$ objective. Empirically, varying the kernel dimension and $L_1$ strength reveals sparse double descent and invariant minima locations with respect to $ abla$alpha. The findings have practical implications for pruning and kernel-based regularization, and motivate validating these phenomena on more complex models and diverse datasets.

Abstract

This paper investigates the double descent phenomenon in two-layer neural networks, focusing on the role of L1 regularization and representation dimensions. It explores an alternative double descent phenomenon, named sparse double descent. The study emphasizes the complex relationship between model complexity, sparsity, and generalization, and suggests further research into more diverse models and datasets. The findings contribute to a deeper understanding of neural network training and optimization.

Manipulating Sparse Double Descent

TL;DR

regularization as a convex surrogate for the

objective. Empirically, varying the kernel dimension and

strength reveals sparse double descent and invariant minima locations with respect to

alpha. The findings have practical implications for pruning and kernel-based regularization, and motivate validating these phenomena on more complex models and diverse datasets.

Abstract

Paper Structure (6 sections, 1 figure, 1 table)

This paper contains 6 sections, 1 figure, 1 table.

Introduction
Background
Experiments
Results
Discussion
Future Directions

Figures (1)

Figure 1: Simple depiction of the double descent phenomenon. $d$ refers to the number of the parameters in the model and $n$ refers to the number of training data points.

Manipulating Sparse Double Descent

TL;DR

Abstract

Manipulating Sparse Double Descent

Authors

TL;DR

Abstract

Table of Contents

Figures (1)