Is Complexity Required for Neural Network Pruning? A Case Study on Global Magnitude Pruning

Manas Gupta; Efe Camci; Vishandi Rudy Keneta; Abhishek Vaidyanathan; Ritwik Kanodia; Chuan-Sheng Foo; Wu Min; Lin Jie

Is Complexity Required for Neural Network Pruning? A Case Study on Global Magnitude Pruning

Manas Gupta, Efe Camci, Vishandi Rudy Keneta, Abhishek Vaidyanathan, Ritwik Kanodia, Chuan-Sheng Foo, Wu Min, Lin Jie

TL;DR

This paper challenges the necessity of complex pruning strategies by proposing Global Magnitude Pruning (Global MP) as a simple, effective baseline. It introduces Minimum Threshold (MT) to prevent layer-collapse and demonstrates, through extensive experiments on CIFAR-10, ImageNet, and HAR-2 across CNNs and non-CNNs, that Global MP often outperforms state-of-the-art pruning methods in sparsity-accuracy trade-offs and remains competitive in FLOPs-accuracy, especially at very high sparsity levels. The results suggest that complexity does not necessarily translate to better pruning performance and highlight the practical value of simple, hyperparameter-light approaches. The authors also discuss limitations and outline future work, including theoretical analysis of Global MP and joint optimization over weights and FLOPs.

Abstract

Pruning neural networks has become popular in the last decade when it was shown that a large number of weights can be safely removed from modern neural networks without compromising accuracy. Numerous pruning methods have been proposed since, each claiming to be better than prior art, however, at the cost of increasingly complex pruning methodologies. These methodologies include utilizing importance scores, getting feedback through back-propagation or having heuristics-based pruning rules amongst others. In this work, we question whether this pattern of introducing complexity is really necessary to achieve better pruning results. We benchmark these SOTA techniques against a simple pruning baseline, namely, Global Magnitude Pruning (Global MP), that ranks weights in order of their magnitudes and prunes the smallest ones. Surprisingly, we find that vanilla Global MP performs very well against the SOTA techniques. When considering sparsity-accuracy trade-off, Global MP performs better than all SOTA techniques at all sparsity ratios. When considering FLOPs-accuracy trade-off, some SOTA techniques outperform Global MP at lower sparsity ratios, however, Global MP starts performing well at high sparsity ratios and performs very well at extremely high sparsity ratios. Moreover, we find that a common issue that many pruning algorithms run into at high sparsity rates, namely, layer-collapse, can be easily fixed in Global MP. We explore why layer collapse occurs in networks and how it can be mitigated in Global MP by utilizing a technique called Minimum Threshold. We showcase the above findings on various models (WRN-28-8, ResNet-32, ResNet-50, MobileNet-V1 and FastGRNN) and multiple datasets (CIFAR-10, ImageNet and HAR-2). Code is available at https://github.com/manasgupta-1/GlobalMP.

Is Complexity Required for Neural Network Pruning? A Case Study on Global Magnitude Pruning

TL;DR

Abstract

Paper Structure (21 sections, 2 equations, 5 figures, 7 tables, 1 algorithm)

This paper contains 21 sections, 2 equations, 5 figures, 7 tables, 1 algorithm.

Introduction
Related Work
Method
Global Magnitude Pruning (Global MP)
Minimum Threshold (MT)
The Pruning Workflow
Experiments
Comparison with SOTA
CIFAR-10
ImageNet
Structured pruning and generalizing to other domains and RNN architectures
Mitigating layer-collapse
Discussion, Limitations and Future Work
Conclusions
Acknowledgement
...and 6 more sections

Figures (5)

Figure 1: Illustration of how Global MP works. Global MP ranks all the weights in a network by their magnitudes and prunes off the smallest weights until the target sparsity is met. Light green weights refer to the smaller-magnitude weights which are pruned off. A pruned network consisting of larger-magnitude weights (dark green weights) is obtained after the process.
Figure 2: Difference in architectures between WRN and MobileNet. WRN does not have prunable residual connections in the last layers (dotted lines) while MobileNet does. This leads to different pruning behaviors on the two architectures.
Figure 3: Ablation study on how changing the minimum threshold (MT) affects accuracy. Just like tuning any hyper-parameter, the accuracy increases when MT is initially increased until it hits a maximum value, after which the accuracy decreases on increasing MT further. Hence, MT is easy to search and can be set in the same way as searching for any hyper-parameter. Using MobileNet-V1 on ImageNet.
Figure 4: For MobileNet-V2 at 98% sparsity, MT helps retain some weights in the heavily pruned layers (Layers 55, 56, and 57) and allows the model to learn successfully.
Figure 5: Layer-wise pruning results produced by Global MP on MobileNet-V2 model on CIFAR-10. Pruning is conducted on three different pre-trained models and the pruning results across the three runs are very stable.

Is Complexity Required for Neural Network Pruning? A Case Study on Global Magnitude Pruning

TL;DR

Abstract

Is Complexity Required for Neural Network Pruning? A Case Study on Global Magnitude Pruning

Authors

TL;DR

Abstract

Table of Contents

Figures (5)