Table of Contents
Fetching ...

Neural network relief: a pruning algorithm based on neural activity

Aleksandr Dekhovich, David M. J. Tax, Marcel H. F. Sluiter, Miguel A. Bessa

TL;DR

This work proposes an iterative pruning strategy introducing a simple importance-score metric that deactivates unimportant connections, tackling overparameterization in DNNs and modulating the firing patterns to find the smallest number of connections that is still capable of solving a given task with comparable accuracy.

Abstract

Current deep neural networks (DNNs) are overparameterized and use most of their neuronal connections during inference for each task. The human brain, however, developed specialized regions for different tasks and performs inference with a small fraction of its neuronal connections. We propose an iterative pruning strategy introducing a simple importance-score metric that deactivates unimportant connections, tackling overparameterization in DNNs and modulating the firing patterns. The aim is to find the smallest number of connections that is still capable of solving a given task with comparable accuracy, i.e. a simpler subnetwork. We achieve comparable performance for LeNet architectures on MNIST, and significantly higher parameter compression than state-of-the-art algorithms for VGG and ResNet architectures on CIFAR-10/100 and Tiny-ImageNet. Our approach also performs well for the two different optimizers considered -- Adam and SGD. The algorithm is not designed to minimize FLOPs when considering current hardware and software implementations, although it performs reasonably when compared to the state of the art.

Neural network relief: a pruning algorithm based on neural activity

TL;DR

This work proposes an iterative pruning strategy introducing a simple importance-score metric that deactivates unimportant connections, tackling overparameterization in DNNs and modulating the firing patterns to find the smallest number of connections that is still capable of solving a given task with comparable accuracy.

Abstract

Current deep neural networks (DNNs) are overparameterized and use most of their neuronal connections during inference for each task. The human brain, however, developed specialized regions for different tasks and performs inference with a small fraction of its neuronal connections. We propose an iterative pruning strategy introducing a simple importance-score metric that deactivates unimportant connections, tackling overparameterization in DNNs and modulating the firing patterns. The aim is to find the smallest number of connections that is still capable of solving a given task with comparable accuracy, i.e. a simpler subnetwork. We achieve comparable performance for LeNet architectures on MNIST, and significantly higher parameter compression than state-of-the-art algorithms for VGG and ResNet architectures on CIFAR-10/100 and Tiny-ImageNet. Our approach also performs well for the two different optimizers considered -- Adam and SGD. The algorithm is not designed to minimize FLOPs when considering current hardware and software implementations, although it performs reasonably when compared to the state of the art.

Paper Structure

This paper contains 15 sections, 13 equations, 11 figures, 10 tables, 2 algorithms.

Figures (11)

  • Figure 1: Neural network layers $l-1$ and $l$ with $m_{l-1} = 4$ and $m_l = 2$ neurons, respectively. The weights associated to a connection between neurons $i$ and $j$ in layer $l-1$ to $l$ are $w_{ij}^{(l)}$.
  • Figure 2: LeNet-300-100 architecture on MNIST before and after pruning, where connections are coloured with respect to importance score: blue (least important) $\to$ red (most important).
  • Figure 3: Architecture structure for VGG-like on CIFAR-10 with Adam optimizer considering three random initializations.
  • Figure 4: Results for VGG-13 over three seeds; mean values are used to compute dots and standard deviation are shown with error bars. We compare two optimizers, SGD and Adam, and two different values of weight decay for evaluation after 5 pruning iterations.
  • Figure 5: VGG-13 architecture on CIFAR-100 trained with SGD (top) and Adam (bottom), and weight decay equal to $5 \cdot 10^{-4}$ (left) and $10^{-4}$ (right) and pruned with 5 iterations. The results for three different seeds are presented.
  • ...and 6 more figures