Table of Contents
Fetching ...

Layer-adaptive sparsity for the Magnitude-based Pruning

Jaeho Lee, Sejun Park, Sangwoo Mo, Sungsoo Ahn, Jinwoo Shin

TL;DR

The paper tackles the ambiguous choice of layerwise sparsity in magnitude-based pruning by introducing Layer-Adaptive Magnitude-based Pruning (LAMP). LAMP defines a per-weight score that normalizes weight magnitude by the surviving-weight energy within a layer, enabling global pruning to implicitly yield optimal layerwise sparsity without tuning. The approach is theoretically motivated by minimizing model-level output distortion and is validated across diverse CNN architectures and image datasets, as well as language modeling tasks, consistently outperforming baseline sparsity schemes and showing robustness in ablations. It also reveals that LAMP recovers intuitive heuristics (e.g., preserving early and late layers) and yields comparatively uniform sparsity distribution across layers at high sparsity levels, suggesting practical benefits for memory capacity and expressivity.

Abstract

Recent discoveries on neural network pruning reveal that, with a carefully chosen layerwise sparsity, a simple magnitude-based pruning achieves state-of-the-art tradeoff between sparsity and performance. However, without a clear consensus on "how to choose," the layerwise sparsities are mostly selected algorithm-by-algorithm, often resorting to handcrafted heuristics or an extensive hyperparameter search. To fill this gap, we propose a novel importance score for global pruning, coined layer-adaptive magnitude-based pruning (LAMP) score; the score is a rescaled version of weight magnitude that incorporates the model-level $\ell_2$ distortion incurred by pruning, and does not require any hyperparameter tuning or heavy computation. Under various image classification setups, LAMP consistently outperforms popular existing schemes for layerwise sparsity selection. Furthermore, we observe that LAMP continues to outperform baselines even in weight-rewinding setups, while the connectivity-oriented layerwise sparsity (the strongest baseline overall) performs worse than a simple global magnitude-based pruning in this case. Code: https://github.com/jaeho-lee/layer-adaptive-sparsity

Layer-adaptive sparsity for the Magnitude-based Pruning

TL;DR

The paper tackles the ambiguous choice of layerwise sparsity in magnitude-based pruning by introducing Layer-Adaptive Magnitude-based Pruning (LAMP). LAMP defines a per-weight score that normalizes weight magnitude by the surviving-weight energy within a layer, enabling global pruning to implicitly yield optimal layerwise sparsity without tuning. The approach is theoretically motivated by minimizing model-level output distortion and is validated across diverse CNN architectures and image datasets, as well as language modeling tasks, consistently outperforming baseline sparsity schemes and showing robustness in ablations. It also reveals that LAMP recovers intuitive heuristics (e.g., preserving early and late layers) and yields comparatively uniform sparsity distribution across layers at high sparsity levels, suggesting practical benefits for memory capacity and expressivity.

Abstract

Recent discoveries on neural network pruning reveal that, with a carefully chosen layerwise sparsity, a simple magnitude-based pruning achieves state-of-the-art tradeoff between sparsity and performance. However, without a clear consensus on "how to choose," the layerwise sparsities are mostly selected algorithm-by-algorithm, often resorting to handcrafted heuristics or an extensive hyperparameter search. To fill this gap, we propose a novel importance score for global pruning, coined layer-adaptive magnitude-based pruning (LAMP) score; the score is a rescaled version of weight magnitude that incorporates the model-level distortion incurred by pruning, and does not require any hyperparameter tuning or heavy computation. Under various image classification setups, LAMP consistently outperforms popular existing schemes for layerwise sparsity selection. Furthermore, we observe that LAMP continues to outperform baselines even in weight-rewinding setups, while the connectivity-oriented layerwise sparsity (the strongest baseline overall) performs worse than a simple global magnitude-based pruning in this case. Code: https://github.com/jaeho-lee/layer-adaptive-sparsity

Paper Structure

This paper contains 16 sections, 16 equations, 7 figures, 8 tables.

Figures (7)

  • Figure 1: The LAMP score is a squared weight magnitude, normalized by the sum of all "surviving weights" in the layer. Global pruning by LAMP is equivalent to the layerwise magnitude-based pruning with an automatically chosen layerwise sparsity.
  • Figure 2: Sparsity-accuracy tradeoff curves of VGG-16, ResNet-18, DenseNet-121, and EfficientNet-B0. All models are iteratively pruned and retrained with CIFAR-10 dataset.
  • Figure 3: Sparsity-accuracy tradeoff curves of pruned models trained for SVHN and CIFAR-100 (on VGG-16) and Restricted ImageNet (on ResNet-34).
  • Figure 4: Sparsity-accuracy tradeoff curves under one-shot pruning, weight rewinding, and the SNIP setup. One-shot pruning and the weight-rewinding experiments are done on VGG-16 trained on CIFAR-10 dataset. SNIP experiment is performed on Conv-6 trained on CIFAR-10.
  • Figure 5: Layerwise statistics of VGG-16 iteratively pruned on CIFAR-10. Top: Layerwise survival rate for models with $\{51.2\%,26.2\%,13.4\%,6.87\%,3.52\%\}$ weights surviving. Bottom: Number of nonzero weights for models with $\{3.52\%,1.80\%,0.92\%,0.47\%,0.24\%\}$ weights surviving.
  • ...and 2 more figures