Table of Contents
Fetching ...

Filter Pruning based on Information Capacity and Independence

Xiaolong Tang, Shuo Ye, Yufeng Shi, Tianheng Hu, Qinmu Peng, Xinge You

TL;DR

The paper tackles the practical challenge of pruning CNNs by proposing a dual-metric framework that evaluates filters from both local and global perspectives. It introduces information capacity, based on entropy-like estimates of a filter's weight distribution, and information independence, capturing inter-filter redundancy, then fuses these metrics with a balancing parameter to score filters for pruning in a data-free but feature-guided manner. The approach yields substantial compression and speedups across CIFAR-10/100, ImageNet, and MS COCO, while maintaining minimal accuracy loss (e.g., large FLOPs/parameter reductions with negligible Top-1/Top-5 drops on ResNet-50/ILSVRC-2012) and enabling faster inference. The results demonstrate the method's robustness, scalability to transformers and segmentation tasks, and practical potential for deploying pruning-enabled models on resource-constrained devices.

Abstract

Filter pruning has gained widespread adoption for the purpose of compressing and speeding up convolutional neural networks (CNNs). However, existing approaches are still far from practical applications due to biased filter selection and heavy computation cost. This paper introduces a new filter pruning method that selects filters in an interpretable, multi-perspective, and lightweight manner. Specifically, we evaluate the contributions of filters from both individual and overall perspectives. For the amount of information contained in each filter, a new metric called information capacity is proposed. Inspired by the information theory, we utilize the interpretable entropy to measure the information capacity, and develop a feature-guided approximation process. For correlations among filters, another metric called information independence is designed. Since the aforementioned metrics are evaluated in a simple but effective way, we can identify and prune the least important filters with less computation cost. We conduct comprehensive experiments on benchmark datasets employing various widely-used CNN architectures to evaluate the performance of our method. For instance, on ILSVRC-2012, our method outperforms state-of-the-art methods by reducing FLOPs by 77.4% and parameters by 69.3% for ResNet-50 with only a minor decrease in accuracy of 2.64%.

Filter Pruning based on Information Capacity and Independence

TL;DR

The paper tackles the practical challenge of pruning CNNs by proposing a dual-metric framework that evaluates filters from both local and global perspectives. It introduces information capacity, based on entropy-like estimates of a filter's weight distribution, and information independence, capturing inter-filter redundancy, then fuses these metrics with a balancing parameter to score filters for pruning in a data-free but feature-guided manner. The approach yields substantial compression and speedups across CIFAR-10/100, ImageNet, and MS COCO, while maintaining minimal accuracy loss (e.g., large FLOPs/parameter reductions with negligible Top-1/Top-5 drops on ResNet-50/ILSVRC-2012) and enabling faster inference. The results demonstrate the method's robustness, scalability to transformers and segmentation tasks, and practical potential for deploying pruning-enabled models on resource-constrained devices.

Abstract

Filter pruning has gained widespread adoption for the purpose of compressing and speeding up convolutional neural networks (CNNs). However, existing approaches are still far from practical applications due to biased filter selection and heavy computation cost. This paper introduces a new filter pruning method that selects filters in an interpretable, multi-perspective, and lightweight manner. Specifically, we evaluate the contributions of filters from both individual and overall perspectives. For the amount of information contained in each filter, a new metric called information capacity is proposed. Inspired by the information theory, we utilize the interpretable entropy to measure the information capacity, and develop a feature-guided approximation process. For correlations among filters, another metric called information independence is designed. Since the aforementioned metrics are evaluated in a simple but effective way, we can identify and prune the least important filters with less computation cost. We conduct comprehensive experiments on benchmark datasets employing various widely-used CNN architectures to evaluate the performance of our method. For instance, on ILSVRC-2012, our method outperforms state-of-the-art methods by reducing FLOPs by 77.4% and parameters by 69.3% for ResNet-50 with only a minor decrease in accuracy of 2.64%.
Paper Structure (21 sections, 7 equations, 6 figures, 8 tables, 1 algorithm)

This paper contains 21 sections, 7 equations, 6 figures, 8 tables, 1 algorithm.

Figures (6)

  • Figure 1: Framework of the proposed approach. First, given a pre-trained network, we calculate the information capacity and information independence of each filter, respectively. Then, we weight and sum these two metrics to obtain an importance score for each filter. Finally, we prune the filters with the lowest scores according to a predetermined pruning rate, resulting in a compressed network.
  • Figure 2: The graph displays the disparities in rank and entropy of feature maps produced from the same layer. The x-axis indicates the index of feature maps, and the y-axis represents the mean rank and entropy values of feature maps. The outcomes indicate that entropy can capture the knowledge contained in each feature map in a more detailed way, leading to better differentiation of the information conveyed by distinct feature maps.
  • Figure 3: The correlation between feature map entropy and corresponding filter entropy. The left diagram pertains to VGG-16 on CIFAR-10, while the right diagram pertains to ResNet-50 on ILSVRC-2012.
  • Figure 4: Top-1 Accuracy for variants of metric weight $\sigma$. The upper figure is the accuracy of the pruned VGG-16, and the lower figure is about the pruned ResNet-56. The x-axis denotes different weight values, and the y-axis denotes the accuracy of the pruned model. Notably, the proposed method achieves optimal performance when the value of $\sigma$ is within the range of 0.7 to 0.9.
  • Figure 5: The graph depicts the feature maps of the intermediate layer output in the first block of ResNet-50. The input image is 224*224, the pruning rate is 10% and the feature maps are numbered from 0 to 63, with filters corresponding one-to-one with the feature maps. We independently evaluate the importance of filters using information capacity and information independence. The purple box represents the least important filters exclusively selected by information capacity, corresponding to the indices (9, 17); the orange box represents the least important filters solely chosen by information independence, with indices (49, 60); and the red box represents filters simultaneously selected by both, with indices (0, 10, 19, 59, 61).
  • ...and 1 more figures