Table of Contents
Fetching ...

A Recovery Guarantee for Sparse Neural Networks

Sara Fridovich-Keil, Mert Pilanci

TL;DR

It is proved that the first guarantees of sparse recovery for ReLU neural networks, where the sparse network weights constitute the signal to be recovered, are proved.

Abstract

We prove the first guarantees of sparse recovery for ReLU neural networks, where the sparse network weights constitute the signal to be recovered. Specifically, we study structural properties of the sparse network weights for two-layer, scalar-output networks under which a simple iterative hard thresholding algorithm recovers these weights exactly, using memory that grows linearly in the number of nonzero weights. We validate this theoretical result with simple experiments on recovery of sparse planted MLPs, MNIST classification, and implicit neural representations. Experimentally, we find performance that is competitive with, and often exceeds, a high-performing but memory-inefficient baseline based on iterative magnitude pruning. Code is available at https://github.com/voilalab/MLP-IHT.

A Recovery Guarantee for Sparse Neural Networks

TL;DR

It is proved that the first guarantees of sparse recovery for ReLU neural networks, where the sparse network weights constitute the signal to be recovered, are proved.

Abstract

We prove the first guarantees of sparse recovery for ReLU neural networks, where the sparse network weights constitute the signal to be recovered. Specifically, we study structural properties of the sparse network weights for two-layer, scalar-output networks under which a simple iterative hard thresholding algorithm recovers these weights exactly, using memory that grows linearly in the number of nonzero weights. We validate this theoretical result with simple experiments on recovery of sparse planted MLPs, MNIST classification, and implicit neural representations. Experimentally, we find performance that is competitive with, and often exceeds, a high-performing but memory-inefficient baseline based on iterative magnitude pruning. Code is available at https://github.com/voilalab/MLP-IHT.

Paper Structure

This paper contains 30 sections, 10 theorems, 51 equations, 10 figures, 2 tables.

Key Result

Lemma 1

Let $A \in \mathbb{R}^{n \times d p}$ be as defined in eq:convexAhnotation, with the modification that all columns are normalized to have unit $\ell_2$ norm. Assume that entries of the data matrix $X \in \mathbb{R}^{n \times d}$ are drawn i.i.d. $\mathcal{N}(0,1)$ and assumption 2 holds. Consider an Here $\varepsilon$ and $\gamma$ are the same as in assumptions, and $c$ is a positive universal con

Figures (10)

  • Figure 1: Average PSNR for fitting a planted one-hidden-layer (left) and two-hidden-layer (right) sparse scalar-output MLP of hidden dimension $m$ (vertical axis) and at most $s$ nonzero parameters (horizontal axis). Colorbar shows average PSNR over 3 random trials. IHT exhibits more robust performance than a strong but memory-inefficient iterative magnitude pruning (IMP) baseline frankle2018lottery.
  • Figure 2: Average PSNR for fitting a planted one-hidden-layer (left) and two-hidden-layer (right) sparse vector-output (10-dimensional output) MLP of hidden dimension $m$ (vertical axis) and at most $s$ nonzero parameters (horizontal axis). Colorbar shows average PSNR over 3 random trials. IHT is competitive with a strong but memory-inefficient iterative magnitude pruning (IMP) baseline frankle2018lottery.
  • Figure 3: Average binary (left) and 10-class (right) classification accuracy for handwritten MNIST digits with a 2-layer (one-hidden-layer) MLP of hidden dimension $m$ (vertical axis) and at most $s$ nonzero parameters (horizontal axis). Colorbar shows average classification accuracy over 3 random trials. IHT exhibits more robust performance than a strong but memory-inefficient iterative magnitude pruning (IMP) baseline frankle2018lottery.
  • Figure 4: Average 1-hidden-layer (left) and 2-hidden-layer (right) PSNR for overfitting an MNIST digit image with an MLP-based implicit neural representation tancik2020fourfeat of hidden dimension $m$ (vertical axis) and at most $s$ nonzero parameters (horizontal axis). Colorbar shows average PSNR over 3 random trials. IHT exhibits more robust performance than a strong but memory-inefficient iterative magnitude pruning (IMP) baseline frankle2018lottery. We highlight that IHT exhibits stable recovery independent of $m$, in line with our theoretical results (see \ref{['remark:ndependence']}). In contrast, IMP shows improved recovery with increasing $m$, likely because IMP here is solving a nonconvex optimization problem whose landscape is made more benign by increasing $m$.
  • Figure 5: Average 1-hidden-layer (left) and 2-hidden-layer (right) PSNR for overfitting a CIFAR-10 digit image with an MLP-based implicit neural representation tancik2020fourfeat of hidden dimension $m$ (vertical axis) and at most $s$ nonzero parameters (horizontal axis). Colorbar shows average PSNR over 3 random trials. IHT exhibits more robust performance than a strong but memory-inefficient iterative magnitude pruning (IMP) baseline frankle2018lottery.
  • ...and 5 more figures

Theorems & Definitions (19)

  • Remark 1: Sample complexity
  • Lemma 1: Restricted strong convexity and restricted smoothness
  • Theorem 1: IHT recovers sparse MLP weights
  • Remark 2
  • proof
  • Theorem 2: Hanson-Wright boucheronbook
  • proof
  • Theorem 3: jain2014iterative
  • Lemma 2: based on ergen2019random
  • proof
  • ...and 9 more