FAIR-Pruner: Leveraging Tolerance of Difference for Flexible Automatic Layer-Wise Neural Network Pruning

Chenqing Lin; Mostafa Hussien; Chengyao Yu; Bingyi Jing; Mohamed Cheriet; Osama Abdelrahman; Ruixing Ming

FAIR-Pruner: Leveraging Tolerance of Difference for Flexible Automatic Layer-Wise Neural Network Pruning

Chenqing Lin, Mostafa Hussien, Chengyao Yu, Bingyi Jing, Mohamed Cheriet, Osama Abdelrahman, Ruixing Ming

TL;DR

FAIR-Pruner addresses the challenge of pruning neural networks with non-uniform, layer-wise sparsity by introducing ToD, a balance between a Wasserstein-based Utilization Score and a Taylor-based Reconstruction Score to determine per-layer pruning budgets. The method decouples threshold determination from importance estimation, enabling fast, flexible one-shot pruning controlled by a preset level $\alpha$, with per-layer counts $\hat{m}^{(l)}$ guided by ToD. Key components include the Use of a Wasserstein distance for unit-level discriminative power, the first-order Taylor approximation for loss impact, and a pruning strategy based on quantile thresholds for both scores. Empirically, FAIR-Pruner achieves state-of-the-art or competitive accuracy at high compression across CIFAR-10, SVHN, and ImageNet on architectures like VGG, AlexNet, ResNet, and DenseNet, while providing substantial speedups and low overhead. The approach also demonstrates that ToD can enhance existing saliency metrics (e.g., L1) by delivering better accuracy than uniform pruning, highlighting its practical impact for efficient deployment on edge devices.

Abstract

Neural network pruning has been widely adopted to reduce the parameter scale of complex neural networks, enabling efficient deployment on resource-limited edge devices. Mainstream pruning methods typically adopt uniform pruning strategies, which tend to cause a substantial performance degradation under high sparsity levels. Recent studies focus on non-uniform layer-wise pruning, but such approaches typically depend on global architecture optimization, which is computational expensive and lacks flexibility. To address these limitations, this paper proposes a novel method named Flexible Automatic Identification and Removal (FAIR)-Pruner, which adaptively determines the sparsity levels of each layer and identifies the units to be pruned. The core of FAIR-Pruner lies in the introduction of a novel indicator, Tolerance of Differences (ToD), designed to balance the importance scores obtained from two complementary perspectives: the architecture-level (Utilization Score) and the task-level (Reconstruction Score). By controlling ToD at preset levels, FAIR-Pruner determines layer-specific thresholds and removes units whose Utilization Scores fall below the corresponding thresholds. Furthermore, by decoupling threshold determination from importance estimation, FAIR-Pruner allows users to flexibly obtain pruned models under varying pruning ratios. Extensive experiments demonstrate that FAIR-Pruner achieves state-of-the-art performance, maintaining higher accuracy even at high compression ratios. Moreover, the ToD based layer-wise pruning ratios can be directly applied to existing powerful importance measurements, thereby improving the performance under uniform-pruning.

FAIR-Pruner: Leveraging Tolerance of Difference for Flexible Automatic Layer-Wise Neural Network Pruning

TL;DR

Abstract

FAIR-Pruner: Leveraging Tolerance of Difference for Flexible Automatic Layer-Wise Neural Network Pruning

TL;DR

Abstract

Paper Structure

Table of Contents

Key Result

Figures (9)

Theorems & Definitions (3)