Table of Contents
Fetching ...

Data-free parameter pruning for Deep Neural Networks

Suraj Srinivas, R. Venkatesh Babu

TL;DR

The paper introduces a data-free, neuron-level pruning technique that identifies and removes redundant or near-identical neurons by a saliency-based criterion, then merges their contributions ('surgery'). It connects conceptually to Optimal Brain Damage and Knowledge Distillation while avoiding training data, enabling rapid pruning of fully connected layers. Across LeNet and AlexNet-style networks, the method achieves substantial parameter reductions with minimal accuracy loss and demonstrates superior speed compared to traditional pruning methods. A practical heuristic based on saliency curves guides the pruning extent, making the approach scalable to large networks. Overall, the work offers a fast, data-free path to compress networks with little performance penalty, applicable to common FC-layer architectures.

Abstract

Deep Neural nets (NNs) with millions of parameters are at the heart of many state-of-the-art computer vision systems today. However, recent works have shown that much smaller models can achieve similar levels of performance. In this work, we address the problem of pruning parameters in a trained NN model. Instead of removing individual weights one at a time as done in previous works, we remove one neuron at a time. We show how similar neurons are redundant, and propose a systematic way to remove them. Our experiments in pruning the densely connected layers show that we can remove upto 85\% of the total parameters in an MNIST-trained network, and about 35\% for AlexNet without significantly affecting performance. Our method can be applied on top of most networks with a fully connected layer to give a smaller network.

Data-free parameter pruning for Deep Neural Networks

TL;DR

The paper introduces a data-free, neuron-level pruning technique that identifies and removes redundant or near-identical neurons by a saliency-based criterion, then merges their contributions ('surgery'). It connects conceptually to Optimal Brain Damage and Knowledge Distillation while avoiding training data, enabling rapid pruning of fully connected layers. Across LeNet and AlexNet-style networks, the method achieves substantial parameter reductions with minimal accuracy loss and demonstrates superior speed compared to traditional pruning methods. A practical heuristic based on saliency curves guides the pruning extent, making the approach scalable to large networks. Overall, the work offers a fast, data-free path to compress networks with little performance penalty, applicable to common FC-layer architectures.

Abstract

Deep Neural nets (NNs) with millions of parameters are at the heart of many state-of-the-art computer vision systems today. However, recent works have shown that much smaller models can achieve similar levels of performance. In this work, we address the problem of pruning parameters in a trained NN model. Instead of removing individual weights one at a time as done in previous works, we remove one neuron at a time. We show how similar neurons are redundant, and propose a systematic way to remove them. Our experiments in pruning the densely connected layers show that we can remove upto 85\% of the total parameters in an MNIST-trained network, and about 35\% for AlexNet without significantly affecting performance. Our method can be applied on top of most networks with a fully connected layer to give a smaller network.

Paper Structure

This paper contains 13 sections, 1 theorem, 14 equations, 4 figures, 2 tables.

Key Result

Lemma 1

Let $a,b \in \mathcal{R}$ and $h(\cdot)$ be a monotonically increasing function, such that $max\left( \frac{\mathrm{d}h(x)}{\mathrm{d}x}\right) \leq 1, \forall x \in \mathcal{R}$. Then,

Figures (4)

  • Figure 1: A toy example showing the effect of equal weight-sets ($W_1 = W_4$). The circles in the diagram are neurons and the lines represent weights. Weights of the same colour in the input layer constitute a weight-set.
  • Figure 2: (a) Scaled appropriately, the saliency curve closely follows that of increase in test error ; (b) The histogram of saliency values. The black bar indicates the mode of the gaussian-like curve.
  • Figure 3: Comparison of proposed approach with OBD and OBS. Our method is able to prune many more weights than OBD/OBS at little or no increase in test error
  • Figure 4: Comparison with and without surgery. Our method breaks down when surgery is not performed. Note that the y-axis is the log of test error.

Theorems & Definitions (2)

  • Lemma 1
  • proof : Proof of Lemma 1