Table of Contents
Fetching ...

Pruning neural networks without any data by iteratively conserving synaptic flow

Hidenori Tanaka, Daniel Kunin, Daniel L. K. Yamins, Surya Ganguli

TL;DR

The paper addresses the challenge of pruning neural networks without access to data by identifying layer-collapse as the central obstacle in initialization pruning and deriving conservation laws for synaptic saliency. It introduces Iterative Synaptic Flow Pruning (SynFlow), a data-agnostic algorithm that preserves the total flow of synaptic strengths and, under a framework of iterative, positive, conservative scoring, guarantees Maximal Critical Compression. The approach rivals or surpasses data-dependent pruning methods across multiple architectures and datasets, especially at high sparsity, while avoiding layer-collapse. This work challenges the notion that data is necessary to determine synaptic importance at initialization and offers a principled path to highly sparse, trainable subnetworks without pre-training.

Abstract

Pruning the parameters of deep neural networks has generated intense interest due to potential savings in time, memory and energy both during training and at test time. Recent works have identified, through an expensive sequence of training and pruning cycles, the existence of winning lottery tickets or sparse trainable subnetworks at initialization. This raises a foundational question: can we identify highly sparse trainable subnetworks at initialization, without ever training, or indeed without ever looking at the data? We provide an affirmative answer to this question through theory driven algorithm design. We first mathematically formulate and experimentally verify a conservation law that explains why existing gradient-based pruning algorithms at initialization suffer from layer-collapse, the premature pruning of an entire layer rendering a network untrainable. This theory also elucidates how layer-collapse can be entirely avoided, motivating a novel pruning algorithm Iterative Synaptic Flow Pruning (SynFlow). This algorithm can be interpreted as preserving the total flow of synaptic strengths through the network at initialization subject to a sparsity constraint. Notably, this algorithm makes no reference to the training data and consistently competes with or outperforms existing state-of-the-art pruning algorithms at initialization over a range of models (VGG and ResNet), datasets (CIFAR-10/100 and Tiny ImageNet), and sparsity constraints (up to 99.99 percent). Thus our data-agnostic pruning algorithm challenges the existing paradigm that, at initialization, data must be used to quantify which synapses are important.

Pruning neural networks without any data by iteratively conserving synaptic flow

TL;DR

The paper addresses the challenge of pruning neural networks without access to data by identifying layer-collapse as the central obstacle in initialization pruning and deriving conservation laws for synaptic saliency. It introduces Iterative Synaptic Flow Pruning (SynFlow), a data-agnostic algorithm that preserves the total flow of synaptic strengths and, under a framework of iterative, positive, conservative scoring, guarantees Maximal Critical Compression. The approach rivals or surpasses data-dependent pruning methods across multiple architectures and datasets, especially at high sparsity, while avoiding layer-collapse. This work challenges the notion that data is necessary to determine synaptic importance at initialization and offers a principled path to highly sparse, trainable subnetworks without pre-training.

Abstract

Pruning the parameters of deep neural networks has generated intense interest due to potential savings in time, memory and energy both during training and at test time. Recent works have identified, through an expensive sequence of training and pruning cycles, the existence of winning lottery tickets or sparse trainable subnetworks at initialization. This raises a foundational question: can we identify highly sparse trainable subnetworks at initialization, without ever training, or indeed without ever looking at the data? We provide an affirmative answer to this question through theory driven algorithm design. We first mathematically formulate and experimentally verify a conservation law that explains why existing gradient-based pruning algorithms at initialization suffer from layer-collapse, the premature pruning of an entire layer rendering a network untrainable. This theory also elucidates how layer-collapse can be entirely avoided, motivating a novel pruning algorithm Iterative Synaptic Flow Pruning (SynFlow). This algorithm can be interpreted as preserving the total flow of synaptic strengths through the network at initialization subject to a sparsity constraint. Notably, this algorithm makes no reference to the training data and consistently competes with or outperforms existing state-of-the-art pruning algorithms at initialization over a range of models (VGG and ResNet), datasets (CIFAR-10/100 and Tiny ImageNet), and sparsity constraints (up to 99.99 percent). Thus our data-agnostic pruning algorithm challenges the existing paradigm that, at initialization, data must be used to quantify which synapses are important.

Paper Structure

This paper contains 17 sections, 3 theorems, 8 equations, 10 figures, 1 table, 1 algorithm.

Key Result

Theorem 1

Neuron-wise Conservation of Synaptic Saliency. For a feedforward neural network with continuous, homogeneous activation functions, $\phi(x) = \phi'(x)x$, (e.g. ReLU, Leaky ReLU, linear), the sum of the synaptic saliency for the incoming parameters (including the bias) to a hidden neuron ($\mathcal{S

Figures (10)

  • Figure 1: Layer-collapse leads to a sudden drop in accuracy. Top-1 test accuracy as a function of the compression ratio for a VGG-16 model pruned at initialization and trained on CIFAR-100. Colored arrows represent the critical compression of the corresponding pruning algorithm. Only our algorithm, SynFlow, reaches the theoretical limit of max compression (black dashed line) without collapsing the network. See Sec. \ref{['section:experiments']} for more details on the experiments.
  • Figure 2: Where does layer-collapse occur? Fraction of parameters remaining at each layer of a VGG-19 model pruned at initialization with ImageNet over a range of compression ratios ($10^{n}$ for $n=0,0.5,\dots,6.0$). A higher transparency represents a higher compression ratio. A dashed line indicates that there is at least one layer with no parameters, implying layer-collapse has occurred.
  • Figure 3: Neuron-wise conservation of score. Each dot represents a hidden unit from the feature-extractor of a VGG-19 model pruned at initialization with ImageNet. The location of each dot corresponds to the total score for the unit's incoming and outgoing parameters, $(\mathcal{S}^{\text{in}}, \mathcal{S}^{\text{out}})$. The black dotted line represents exact neuron-wise conservation of score.
  • Figure 4: Inverse relationship between layer size and average layer score. Each dot represents a layer from a VGG-19 model pruned at initialization with ImageNet. The location of each dot corresponds to the layer's average scoreand inverse number of elements. The black dotted line represents a perfect linear relationship.
  • Figure 5: How IMP avoids layer collapse. (a) Multiple iterations of training-pruning cycles is needed to prevent IMP from suffering layer-collapse. (b) The average square magnitude scores per layer, originally at initialization (blue), converge through training towards a linear relationship with the inverse layer size after training (pink), suggesting layer-wise conservation. All data is from a VGG-19 model trained on CIFAR-10.
  • ...and 5 more figures

Theorems & Definitions (6)

  • Theorem 1
  • Theorem 2
  • Theorem 3
  • proof
  • proof
  • proof