An Effective Information Theoretic Framework for Channel Pruning

Yihao Chen; Zefang Wang

An Effective Information Theoretic Framework for Channel Pruning

Yihao Chen, Zefang Wang

TL;DR

This paper presents a novel channel pruning approach via information theory and interpretability of neural networks that employs the information concentration as a reference instead of heuristic and engineering tuning to provide a more interpretable solution to layer-wise pruning ratios.

Abstract

Channel pruning is a promising method for accelerating and compressing convolutional neural networks. However, current pruning algorithms still remain unsolved problems that how to assign layer-wise pruning ratios properly and discard the least important channels with a convincing criterion. In this paper, we present a novel channel pruning approach via information theory and interpretability of neural networks. Specifically, we regard information entropy as the expected amount of information for convolutional layers. In addition, if we suppose a matrix as a system of linear equations, a higher-rank matrix represents there exist more solutions to it, which indicates more uncertainty. From the point of view of information theory, the rank can also describe the amount of information. In a neural network, considering the rank and entropy as two information indicators of convolutional layers, we propose a fusion function to reach a compromise of them, where the fusion results are defined as ``information concentration''. When pre-defining layer-wise pruning ratios, we employ the information concentration as a reference instead of heuristic and engineering tuning to provide a more interpretable solution. Moreover, we leverage Shapley values, which are a potent tool in the interpretability of neural networks, to evaluate the channel contributions and discard the least important channels for model compression while maintaining its performance. Extensive experiments demonstrate the effectiveness and promising performance of our method. For example, our method improves the accuracy by 0.21% when reducing 45.5% FLOPs and removing 40.3% parameters for ResNet-56 on CIFAR-10. Moreover, our method obtains loss in Top-1/Top-5 accuracies of 0.43%/0.11% by reducing 41.6% FLOPs and removing 35.0% parameters for ResNet-50 on ImageNet.

An Effective Information Theoretic Framework for Channel Pruning

TL;DR

Abstract

Paper Structure (6 sections, 8 equations, 3 figures)

This paper contains 6 sections, 8 equations, 3 figures.

Introduction
Related Works
Methodology
Problem Formulation
Information Concentration
Channel Pruning via Shapley Values

Figures (3)

Figure 1: The overview of our proposed method. The boxes denote the channels, and the ones with dash lines indicate they are removed. In the inference stage, we feed the sampled data to the network, after that we obtain the information concentration and Shapley values. Then we assign layer-wise pruning ratios via the information concentration. In the pruning stage, we sort the importance scores of the channels represented by Shapley values and discard the least important ones in each layer. Finally, we fine-tune the pruned model again to reconstruct its accuracy.
Figure 2: Average statistics of rank and entropy for convolutional layer outputs under various input image batches. The convolutional layer indices are shown on the x-axis for each sub-figure, while the number of image batches are shown on the y-axis. The batch size of images for ResNet-20 and ResNet-32 is 256, while for ResNet-18 and ResNet-34 is 32. The sub-figures show that the rank and entropy for the convolutional layer outputs (the columns of each sub-figure) remain unchanged irrespective of image batches.
Figure 3: Average statistics of rank and entropy for convolutional layer outputs and the corresponding fusion of them per layer inside a stage for ResNet with four depths.

An Effective Information Theoretic Framework for Channel Pruning

TL;DR

Abstract

An Effective Information Theoretic Framework for Channel Pruning

Authors

TL;DR

Abstract

Table of Contents

Figures (3)