Table of Contents
Fetching ...

Small Contributions, Small Networks: Efficient Neural Network Pruning Based on Relative Importance

Mostafa Hussien, Mahmoud Afifi, Kim Khoa Nguyen, Mohamed Cheriet

TL;DR

This paper introduces an intuitive and interpretable pruning method based on activation statistics, rooted in information theory and statistical analysis, that consistently outperforms several baseline and state-of-the-art pruning techniques.

Abstract

Recent advancements have scaled neural networks to unprecedented sizes, achieving remarkable performance across a wide range of tasks. However, deploying these large-scale models on resource-constrained devices poses significant challenges due to substantial storage and computational requirements. Neural network pruning has emerged as an effective technique to mitigate these limitations by reducing model size and complexity. In this paper, we introduce an intuitive and interpretable pruning method based on activation statistics, rooted in information theory and statistical analysis. Our approach leverages the statistical properties of neuron activations to identify and remove weights with minimal contributions to neuron outputs. Specifically, we build a distribution of weight contributions across the dataset and utilize its parameters to guide the pruning process. Furthermore, we propose a Pruning-aware Training strategy that incorporates an additional regularization term to enhance the effectiveness of our pruning method. Extensive experiments on multiple datasets and network architectures demonstrate that our method consistently outperforms several baseline and state-of-the-art pruning techniques.

Small Contributions, Small Networks: Efficient Neural Network Pruning Based on Relative Importance

TL;DR

This paper introduces an intuitive and interpretable pruning method based on activation statistics, rooted in information theory and statistical analysis, that consistently outperforms several baseline and state-of-the-art pruning techniques.

Abstract

Recent advancements have scaled neural networks to unprecedented sizes, achieving remarkable performance across a wide range of tasks. However, deploying these large-scale models on resource-constrained devices poses significant challenges due to substantial storage and computational requirements. Neural network pruning has emerged as an effective technique to mitigate these limitations by reducing model size and complexity. In this paper, we introduce an intuitive and interpretable pruning method based on activation statistics, rooted in information theory and statistical analysis. Our approach leverages the statistical properties of neuron activations to identify and remove weights with minimal contributions to neuron outputs. Specifically, we build a distribution of weight contributions across the dataset and utilize its parameters to guide the pruning process. Furthermore, we propose a Pruning-aware Training strategy that incorporates an additional regularization term to enhance the effectiveness of our pruning method. Extensive experiments on multiple datasets and network architectures demonstrate that our method consistently outperforms several baseline and state-of-the-art pruning techniques.

Paper Structure

This paper contains 8 sections, 8 equations, 3 figures, 5 tables, 1 algorithm.

Figures (3)

  • Figure 1: The blind range of various activation functions, defined as the interval in which the gradient of the activation function is zero. In this range, the function's output remains constant, providing a "safe zone" for pruning, where changes to the weights do not affect the model’s output. A wider blind range offers greater flexibility for pruning algorithms, allowing for more aggressive weight reduction without impacting performance. The blind range is highlighted by a yellow color in this figure.
  • Figure 2: A simplified illustrative example of the proposed pruning method applied for a single-node architecture. The left panel depicts a single node receiving three inputs, each connected by a corresponding weight. The center panel shows the calculation of the node output, $a_n$, prior to any pruning. The right panel demonstrates the effect of pruning the second weight, $w_{1,0} = 1.0$, and its subsequent impact on the final node output.
  • Figure 3: We propose a pruning metric based on the relative weight contribution of each neuron. The contribution function is computed by measuring the relative contribution of each neuron in every network layer. The illustration shows the $i$-th column in the fully connected weight matrix. We feed training samples into this layer to compute the output features (in blue). Then, we mask the $i$-th column (set to zero) and compute the output without its influence. The relative contribution of the $i$-th weights (in red) is then computed for each training example, representing the distribution of neuron contributions of this column.