Table of Contents
Fetching ...

An Information Theory-inspired Strategy for Automatic Network Pruning

Xiawu Zheng, Yuexiao Ma, Teng Xi, Gang Zhang, Errui Ding, Yuchao Li, Jie Chen, Yonghong Tian, Rongrong Ji

TL;DR

This paper addresses the burden of manually tuning network pruning under varying device constraints by proposing ITPruner, an information-theory–driven, search-free method. It leverages the information bottleneck theorem and a stable layer-importance indicator based on normalized HSIC ($nHSIC$) to rank layer redundancy and cast pruning under a budget as a convex linear program with quadratic constraints, solvable in seconds. Theoretical analysis links linear $nHSIC$ minimization to mutual information reduction under Gaussian assumptions, and experiments across CNNs, ViTs, detection, and segmentation demonstrate superior compression-accuracy trade-offs with zero search epochs and strong transferability. The method enables practical, device-agnostic model compression, reducing computation and memory without extensive hyperparameter searches or retraining. Overall, ITPruner offers a scalable, theoretically grounded approach for automatic model compression in resource-constrained deployments.

Abstract

Despite superior performance on many computer vision tasks, deep convolution neural networks are well known to be compressed on devices that have resource constraints. Most existing network pruning methods require laborious human efforts and prohibitive computation resources, especially when the constraints are changed. This practically limits the application of model compression when the model needs to be deployed on a wide range of devices. Besides, existing methods are still challenged by the missing theoretical guidance. In this paper we propose an information theory-inspired strategy for automatic model compression. The principle behind our method is the information bottleneck theory, i.e., the hidden representation should compress information with each other. We thus introduce the normalized Hilbert-Schmidt Independence Criterion (nHSIC) on network activations as a stable and generalized indicator of layer importance. When a certain resource constraint is given, we integrate the HSIC indicator with the constraint to transform the architecture search problem into a linear programming problem with quadratic constraints. Such a problem is easily solved by a convex optimization method with a few seconds. We also provide a rigorous proof to reveal that optimizing the normalized HSIC simultaneously minimizes the mutual information between different layers. Without any search process, our method achieves better compression tradeoffs comparing to the state-of-the-art compression algorithms. For instance, with ResNet-50, we achieve a 45.3%-FLOPs reduction, with a 75.75 top-1 accuracy on ImageNet. Codes are avaliable at https://github.com/MAC-AutoML/ITPruner/tree/master.

An Information Theory-inspired Strategy for Automatic Network Pruning

TL;DR

This paper addresses the burden of manually tuning network pruning under varying device constraints by proposing ITPruner, an information-theory–driven, search-free method. It leverages the information bottleneck theorem and a stable layer-importance indicator based on normalized HSIC () to rank layer redundancy and cast pruning under a budget as a convex linear program with quadratic constraints, solvable in seconds. Theoretical analysis links linear minimization to mutual information reduction under Gaussian assumptions, and experiments across CNNs, ViTs, detection, and segmentation demonstrate superior compression-accuracy trade-offs with zero search epochs and strong transferability. The method enables practical, device-agnostic model compression, reducing computation and memory without extensive hyperparameter searches or retraining. Overall, ITPruner offers a scalable, theoretically grounded approach for automatic model compression in resource-constrained deployments.

Abstract

Despite superior performance on many computer vision tasks, deep convolution neural networks are well known to be compressed on devices that have resource constraints. Most existing network pruning methods require laborious human efforts and prohibitive computation resources, especially when the constraints are changed. This practically limits the application of model compression when the model needs to be deployed on a wide range of devices. Besides, existing methods are still challenged by the missing theoretical guidance. In this paper we propose an information theory-inspired strategy for automatic model compression. The principle behind our method is the information bottleneck theory, i.e., the hidden representation should compress information with each other. We thus introduce the normalized Hilbert-Schmidt Independence Criterion (nHSIC) on network activations as a stable and generalized indicator of layer importance. When a certain resource constraint is given, we integrate the HSIC indicator with the constraint to transform the architecture search problem into a linear programming problem with quadratic constraints. Such a problem is easily solved by a convex optimization method with a few seconds. We also provide a rigorous proof to reveal that optimizing the normalized HSIC simultaneously minimizes the mutual information between different layers. Without any search process, our method achieves better compression tradeoffs comparing to the state-of-the-art compression algorithms. For instance, with ResNet-50, we achieve a 45.3%-FLOPs reduction, with a 75.75 top-1 accuracy on ImageNet. Codes are avaliable at https://github.com/MAC-AutoML/ITPruner/tree/master.

Paper Structure

This paper contains 18 sections, 9 theorems, 38 equations, 10 figures, 13 tables, 1 algorithm.

Key Result

Theorem 1

Assuming $X$ is the input random variable follows a Markov random field structure and the Markov random field is ergodic. For a network that have $L$ hidden representations $X_1, X_2, ..., X_L$, with a probability $1-\delta$, the generalization error $\epsilon$ is bounded by where $n$ is the number of training examples.

Figures (10)

  • Figure 1: Overview of our ITPruner. Specifically, We first sample $n$ images to obtain the feature map of each layer. Then the normalized HSIC is employed to calculate the independence map $\boldsymbol{H}$ between different layers. For each layer, we sum the elements of the corresponding column in $\boldsymbol{H}$ except for itself as the importance indicator of the layer. In this way, we model the layer-wise importance and compression constraints as the linear programming problem. Finally, the optimal network architecture is obtained by solving the optimal solution on the linear programming.
  • Figure 2: Sparsity ratio vs accuracy tradeoffs for different unstructured pruning methods and ITPruner on ResNet-50 (left) and VGG-19 (right). ITPruner clearly outperforms the other baselines with a clear margin, especially under large compression ratios.
  • Figure 3: FLOPs, Size, GPU Latency and CPU Latency vs accuracy tradeoffs for different pruning methods and ITPruner on ResNet-50 (up) and MobileNetV1 (bottom). All the models are searched or adjusted according to FLOPs. ITPruner clearly outperforms the other baselines with a clear margin in most cases.
  • Figure 4: The number of channels for the VGG found by ITPruner (red) and evolutionary algorithm (green). The blue line denotes the uncompressed VGG architecture.
  • Figure 5: Accuracy (blue) and variance (green) of layer-wise importance in different $\beta$.
  • ...and 5 more figures

Theorems & Definitions (15)

  • Theorem 1
  • Theorem 2
  • proof
  • Theorem 3
  • proof
  • Theorem 4
  • proof
  • Theorem 5
  • proof
  • Theorem 6
  • ...and 5 more