Filter Pruning based on Information Capacity and Independence
Xiaolong Tang, Shuo Ye, Yufeng Shi, Tianheng Hu, Qinmu Peng, Xinge You
TL;DR
The paper tackles the practical challenge of pruning CNNs by proposing a dual-metric framework that evaluates filters from both local and global perspectives. It introduces information capacity, based on entropy-like estimates of a filter's weight distribution, and information independence, capturing inter-filter redundancy, then fuses these metrics with a balancing parameter to score filters for pruning in a data-free but feature-guided manner. The approach yields substantial compression and speedups across CIFAR-10/100, ImageNet, and MS COCO, while maintaining minimal accuracy loss (e.g., large FLOPs/parameter reductions with negligible Top-1/Top-5 drops on ResNet-50/ILSVRC-2012) and enabling faster inference. The results demonstrate the method's robustness, scalability to transformers and segmentation tasks, and practical potential for deploying pruning-enabled models on resource-constrained devices.
Abstract
Filter pruning has gained widespread adoption for the purpose of compressing and speeding up convolutional neural networks (CNNs). However, existing approaches are still far from practical applications due to biased filter selection and heavy computation cost. This paper introduces a new filter pruning method that selects filters in an interpretable, multi-perspective, and lightweight manner. Specifically, we evaluate the contributions of filters from both individual and overall perspectives. For the amount of information contained in each filter, a new metric called information capacity is proposed. Inspired by the information theory, we utilize the interpretable entropy to measure the information capacity, and develop a feature-guided approximation process. For correlations among filters, another metric called information independence is designed. Since the aforementioned metrics are evaluated in a simple but effective way, we can identify and prune the least important filters with less computation cost. We conduct comprehensive experiments on benchmark datasets employing various widely-used CNN architectures to evaluate the performance of our method. For instance, on ILSVRC-2012, our method outperforms state-of-the-art methods by reducing FLOPs by 77.4% and parameters by 69.3% for ResNet-50 with only a minor decrease in accuracy of 2.64%.
