Table of Contents
Fetching ...

Provable Filter Pruning for Efficient Neural Networks

Lucas Liebenwein, Cenk Baykal, Harry Lang, Dan Feldman, Daniela Rus

TL;DR

The paper tackles the challenge of pruning large CNNs efficiently while guaranteeing both reduced model size and preserved performance. It introduces a data-informed, sampling-based filter pruning method that uses empirical sensitivities to construct an importance distribution for filters, applying a Bernstein-based analysis to guarantee (1±ε) accuracy with probability at least 1−δ and automatic per-layer budget allocation. The approach yields provable compression bounds, scales across architectures and datasets, and demonstrates superior sparsity and runtime efficiency compared with state-of-the-art pruning methods on MNIST, CIFAR-10, and ImageNet, including real-time regression tasks. This enables hardware-agnostic speedups without heavy hyperparameter tuning and provides a practical subroutine for broader neural network optimization tasks such as lottery tickets and architecture search.

Abstract

We present a provable, sampling-based approach for generating compact Convolutional Neural Networks (CNNs) by identifying and removing redundant filters from an over-parameterized network. Our algorithm uses a small batch of input data points to assign a saliency score to each filter and constructs an importance sampling distribution where filters that highly affect the output are sampled with correspondingly high probability. In contrast to existing filter pruning approaches, our method is simultaneously data-informed, exhibits provable guarantees on the size and performance of the pruned network, and is widely applicable to varying network architectures and data sets. Our analytical bounds bridge the notions of compressibility and importance of network structures, which gives rise to a fully-automated procedure for identifying and preserving filters in layers that are essential to the network's performance. Our experimental evaluations on popular architectures and data sets show that our algorithm consistently generates sparser and more efficient models than those constructed by existing filter pruning approaches.

Provable Filter Pruning for Efficient Neural Networks

TL;DR

The paper tackles the challenge of pruning large CNNs efficiently while guaranteeing both reduced model size and preserved performance. It introduces a data-informed, sampling-based filter pruning method that uses empirical sensitivities to construct an importance distribution for filters, applying a Bernstein-based analysis to guarantee (1±ε) accuracy with probability at least 1−δ and automatic per-layer budget allocation. The approach yields provable compression bounds, scales across architectures and datasets, and demonstrates superior sparsity and runtime efficiency compared with state-of-the-art pruning methods on MNIST, CIFAR-10, and ImageNet, including real-time regression tasks. This enables hardware-agnostic speedups without heavy hyperparameter tuning and provides a practical subroutine for broader neural network optimization tasks such as lottery tickets and architecture search.

Abstract

We present a provable, sampling-based approach for generating compact Convolutional Neural Networks (CNNs) by identifying and removing redundant filters from an over-parameterized network. Our algorithm uses a small batch of input data points to assign a saliency score to each filter and constructs an importance sampling distribution where filters that highly affect the output are sampled with correspondingly high probability. In contrast to existing filter pruning approaches, our method is simultaneously data-informed, exhibits provable guarantees on the size and performance of the pruned network, and is widely applicable to varying network architectures and data sets. Our analytical bounds bridge the notions of compressibility and importance of network structures, which gives rise to a fully-automated procedure for identifying and preserving filters in layers that are essential to the network's performance. Our experimental evaluations on popular architectures and data sets show that our algorithm consistently generates sparser and more efficient models than those constructed by existing filter pruning approaches.

Paper Structure

This paper contains 31 sections, 10 theorems, 43 equations, 18 figures, 8 tables, 2 algorithms.

Key Result

Theorem 1

Let $Y_1, \ldots, Y_m$ be a sequence of $m$ i.i.d. random variables satisfying $\max_{k \in [m]} \, \left| Y_k - \mathop{\mathrm{\mathbb{E} \,}}\limits[Y_k] \right| \leq R$, and let $Y = \sum_{k=1}^m Y_k$ denote their sum. Then, for every $\varepsilon \geq 0$, $\delta \in (0, 1)$, we have that $\mat

Figures (18)

  • Figure 1: Overview of our pruning method. We use a small batch of data points to quantify the relative importance $s_j^\ell$ of each filter $W_j^\ell$ in layer $\ell$ by considering the importance of the corresponding feature map $a_j^\ell = \phi(z_j^\ell)$ in computing the output $z^{\ell+1}$ of layer $\ell+1$, where $\phi(\cdot)$ is the non-linear activation function. We then prune filters by sampling each filter $j$ with probability proportional to $s_j^\ell$ and removing the filters that were not sampled. We invoke the filter pruning procedure each layer to obtain the pruned network (the prune step); we then retrain the pruned network (retrain step), and repeat the prune-retrain cycle iteratively.
  • Figure 2: VGG16 architecture
  • Figure 3: Budget Allocation for VGG16
  • Figure 5: LeNet5
  • Figure 6: ResNet56
  • ...and 13 more figures

Theorems & Definitions (18)

  • Theorem 1: Bernstein's inequality vershynin2016high
  • Theorem 2
  • Theorem 3
  • Definition 1: Edge Sensitivity blg2018
  • Definition 2: Neuron Sensitivity
  • Theorem 3
  • Lemma 4
  • proof
  • Lemma 5
  • proof
  • ...and 8 more