Table of Contents
Fetching ...

Advancing Weight and Channel Sparsification with Enhanced Saliency

Xinglong Sun, Maying Shen, Hongxu Yin, Lei Mao, Pavlo Molchanov, Jose M. Alvarez

TL;DR

This work tackles pruning limitations arising from imperfect importance scores and irreversible removals by introducing Iterative Exploitation-Exploration (IEE), which splits a model into an active sparse structure and an exploration space. Through a five-stage cycle—Importance Estimation, Prune, Accuracy Improvement, Reactivate & Explore, and Grow—IEE consistently uses a single saliency criterion to both prune and regrow parameters, enabling robust exploration and improved final sparsity. Across structured and unstructured sparsity on ImageNet, PASCAL VOC, and CIFAR-10, IEE achieves state-of-the-art accuracy with substantial training-cost reductions, including notable gains over HALP and RigL baselines. The method supports from-scratch or pretrained deployment and scales to latency-aware pruning via HALP, providing practical, hardware-conscious sparsification with strong convergence and exploration metrics.

Abstract

Pruning aims to accelerate and compress models by removing redundant parameters, identified by specifically designed importance scores which are usually imperfect. This removal is irreversible, often leading to subpar performance in pruned models. Dynamic sparse training, while attempting to adjust sparse structures during training for continual reassessment and refinement, has several limitations including criterion inconsistency between pruning and growth, unsuitability for structured sparsity, and short-sighted growth strategies. Our paper introduces an efficient, innovative paradigm to enhance a given importance criterion for either unstructured or structured sparsity. Our method separates the model into an active structure for exploitation and an exploration space for potential updates. During exploitation, we optimize the active structure, whereas in exploration, we reevaluate and reintegrate parameters from the exploration space through a pruning and growing step consistently guided by the same given importance criterion. To prepare for exploration, we briefly "reactivate" all parameters in the exploration space and train them for a few iterations while keeping the active part frozen, offering a preview of the potential performance gains from reintegrating these parameters. We show on various datasets and configurations that existing importance criterion even simple as magnitude can be enhanced with ours to achieve state-of-the-art performance and training cost reductions. Notably, on ImageNet with ResNet50, ours achieves an +1.3 increase in Top-1 accuracy over prior art at 90% ERK sparsity. Compared with the SOTA latency pruning method HALP, we reduced its training cost by over 70% while attaining a faster and more accurate pruned model.

Advancing Weight and Channel Sparsification with Enhanced Saliency

TL;DR

This work tackles pruning limitations arising from imperfect importance scores and irreversible removals by introducing Iterative Exploitation-Exploration (IEE), which splits a model into an active sparse structure and an exploration space. Through a five-stage cycle—Importance Estimation, Prune, Accuracy Improvement, Reactivate & Explore, and Grow—IEE consistently uses a single saliency criterion to both prune and regrow parameters, enabling robust exploration and improved final sparsity. Across structured and unstructured sparsity on ImageNet, PASCAL VOC, and CIFAR-10, IEE achieves state-of-the-art accuracy with substantial training-cost reductions, including notable gains over HALP and RigL baselines. The method supports from-scratch or pretrained deployment and scales to latency-aware pruning via HALP, providing practical, hardware-conscious sparsification with strong convergence and exploration metrics.

Abstract

Pruning aims to accelerate and compress models by removing redundant parameters, identified by specifically designed importance scores which are usually imperfect. This removal is irreversible, often leading to subpar performance in pruned models. Dynamic sparse training, while attempting to adjust sparse structures during training for continual reassessment and refinement, has several limitations including criterion inconsistency between pruning and growth, unsuitability for structured sparsity, and short-sighted growth strategies. Our paper introduces an efficient, innovative paradigm to enhance a given importance criterion for either unstructured or structured sparsity. Our method separates the model into an active structure for exploitation and an exploration space for potential updates. During exploitation, we optimize the active structure, whereas in exploration, we reevaluate and reintegrate parameters from the exploration space through a pruning and growing step consistently guided by the same given importance criterion. To prepare for exploration, we briefly "reactivate" all parameters in the exploration space and train them for a few iterations while keeping the active part frozen, offering a preview of the potential performance gains from reintegrating these parameters. We show on various datasets and configurations that existing importance criterion even simple as magnitude can be enhanced with ours to achieve state-of-the-art performance and training cost reductions. Notably, on ImageNet with ResNet50, ours achieves an +1.3 increase in Top-1 accuracy over prior art at 90% ERK sparsity. Compared with the SOTA latency pruning method HALP, we reduced its training cost by over 70% while attaining a faster and more accurate pruned model.

Paper Structure

This paper contains 22 sections, 12 equations, 5 figures, 5 tables, 1 algorithm.

Figures (5)

  • Figure 1: Our unstructured and structured pruning results on ImageNet1K. Left: Unstructured weight sparsity with different pruning ratios as a function of FLOPs and train cost, the top-left is better; Right: Structured pruning targeting various latency constraints, as a function of frame per second during inference where the top-right is better, and train cost the top-left is better. NVIDIA Titan V GPU is used to measure FPS. Train costs are reported as scales w.r.t training a baseline ResNet50.
  • Figure 2: Overview of our method. In each IEE update step, we first train the active weights $\Theta_K$ for $H$ steps then prune a number of connections from $\Theta_K$. We later train the weights $\Theta_K$ just selected for $J$ steps for better exploiting the current architecture. To explore a potentially better sparse architecture, we temporarily activate the exploration space $\Theta_P$ and train them for $Q$ steps while freezing $\Theta_K$. We then evaluate the importance scores of the activated $\Theta_P$ to grow the top-ranked weights. This completes one full IEE update step, and it is repeated until the update period ends.
  • Figure 3: ImageNet1K structured sparsity results on MobileNet-V1 as a function of FPS (left, top-right is better) and training cost (right, top-left is better). FPS is measured on NVIDIA Titan V GPU; training costs are reported relative to dense MobileNet-V1.
  • Figure 4: PASCAL VOC structured sparsity results on SSD512-RN50 as a function of FPS (left, top-right is better) and training cost (right, top-left is better). FPS is measured on NVIDIA Titan V GPU; training costs are reported relative to dense SSD512-RN50.
  • Figure 5: (a) Architecture convergence with IoU after pruning and growing; (b) Grown Neurons Survival Rate for ours and RigL evci2020rigging.