Advancing Weight and Channel Sparsification with Enhanced Saliency
Xinglong Sun, Maying Shen, Hongxu Yin, Lei Mao, Pavlo Molchanov, Jose M. Alvarez
TL;DR
This work tackles pruning limitations arising from imperfect importance scores and irreversible removals by introducing Iterative Exploitation-Exploration (IEE), which splits a model into an active sparse structure and an exploration space. Through a five-stage cycle—Importance Estimation, Prune, Accuracy Improvement, Reactivate & Explore, and Grow—IEE consistently uses a single saliency criterion to both prune and regrow parameters, enabling robust exploration and improved final sparsity. Across structured and unstructured sparsity on ImageNet, PASCAL VOC, and CIFAR-10, IEE achieves state-of-the-art accuracy with substantial training-cost reductions, including notable gains over HALP and RigL baselines. The method supports from-scratch or pretrained deployment and scales to latency-aware pruning via HALP, providing practical, hardware-conscious sparsification with strong convergence and exploration metrics.
Abstract
Pruning aims to accelerate and compress models by removing redundant parameters, identified by specifically designed importance scores which are usually imperfect. This removal is irreversible, often leading to subpar performance in pruned models. Dynamic sparse training, while attempting to adjust sparse structures during training for continual reassessment and refinement, has several limitations including criterion inconsistency between pruning and growth, unsuitability for structured sparsity, and short-sighted growth strategies. Our paper introduces an efficient, innovative paradigm to enhance a given importance criterion for either unstructured or structured sparsity. Our method separates the model into an active structure for exploitation and an exploration space for potential updates. During exploitation, we optimize the active structure, whereas in exploration, we reevaluate and reintegrate parameters from the exploration space through a pruning and growing step consistently guided by the same given importance criterion. To prepare for exploration, we briefly "reactivate" all parameters in the exploration space and train them for a few iterations while keeping the active part frozen, offering a preview of the potential performance gains from reintegrating these parameters. We show on various datasets and configurations that existing importance criterion even simple as magnitude can be enhanced with ours to achieve state-of-the-art performance and training cost reductions. Notably, on ImageNet with ResNet50, ours achieves an +1.3 increase in Top-1 accuracy over prior art at 90% ERK sparsity. Compared with the SOTA latency pruning method HALP, we reduced its training cost by over 70% while attaining a faster and more accurate pruned model.
