Table of Contents
Fetching ...

Learning effective pruning at initialization from iterative pruning

Shengkai Liu, Yaofeng Cheng, Fusheng Zha, Wei Guo, Lining Sun, Zhenshan Bing, Chenguang Yang

TL;DR

This work tackles the efficiency gap in pruning at initialization by learning a data-driven PaI criterion. It proposes AutoSparse, an end-to-end network that predicts per-parameter scores from initial features, trained on a ground-truth dataset generated by one-time iterative rewind pruning (IRP). Across multiple models and datasets, AutoSparse outperforms handcrafted PaI baselines at high sparsity and demonstrates generalization to unseen architectures with a single IRP-derived dataset. The study also performs extensive ablations to understand the influence of input features, datasets, and pruning iterations, revealing that combined parameter and gradient information yields best results and that dataset quality critically affects performance. These findings indicate a practical, scalable pathway to improve PaI and offer insights into neural network learning tendencies relevant to pruning strategies.

Abstract

Pruning at initialization (PaI) reduces training costs by removing weights before training, which becomes increasingly crucial with the growing network size. However, current PaI methods still have a large accuracy gap with iterative pruning, especially at high sparsity levels. This raises an intriguing question: can we get inspiration from iterative pruning to improve the PaI performance? In the lottery ticket hypothesis, the iterative rewind pruning (IRP) finds subnetworks retroactively by rewinding the parameter to the original initialization in every pruning iteration, which means all the subnetworks are based on the initial state. Here, we hypothesise the surviving subnetworks are more important and bridge the initial feature and their surviving score as the PaI criterion. We employ an end-to-end neural network (\textbf{AutoS}parse) to learn this correlation, input the model's initial features, output their score and then prune the lowest score parameters before training. To validate the accuracy and generalization of our method, we performed PaI across various models. Results show that our approach outperforms existing methods in high-sparsity settings. Notably, as the underlying logic of model pruning is consistent in different models, only one-time IRP on one model is needed (e.g., once IRP on ResNet-18/CIFAR-10, AutoS can be generalized to VGG-16/CIFAR-10, ResNet-18/TinyImageNet, et al.). As the first neural network-based PaI method, we conduct extensive experiments to validate the factors influencing this approach. These results reveal the learning tendencies of neural networks and provide new insights into our understanding and research of PaI from a practical perspective. Our code is available at: https://github.com/ChengYaofeng/AutoSparse.git.

Learning effective pruning at initialization from iterative pruning

TL;DR

This work tackles the efficiency gap in pruning at initialization by learning a data-driven PaI criterion. It proposes AutoSparse, an end-to-end network that predicts per-parameter scores from initial features, trained on a ground-truth dataset generated by one-time iterative rewind pruning (IRP). Across multiple models and datasets, AutoSparse outperforms handcrafted PaI baselines at high sparsity and demonstrates generalization to unseen architectures with a single IRP-derived dataset. The study also performs extensive ablations to understand the influence of input features, datasets, and pruning iterations, revealing that combined parameter and gradient information yields best results and that dataset quality critically affects performance. These findings indicate a practical, scalable pathway to improve PaI and offer insights into neural network learning tendencies relevant to pruning strategies.

Abstract

Pruning at initialization (PaI) reduces training costs by removing weights before training, which becomes increasingly crucial with the growing network size. However, current PaI methods still have a large accuracy gap with iterative pruning, especially at high sparsity levels. This raises an intriguing question: can we get inspiration from iterative pruning to improve the PaI performance? In the lottery ticket hypothesis, the iterative rewind pruning (IRP) finds subnetworks retroactively by rewinding the parameter to the original initialization in every pruning iteration, which means all the subnetworks are based on the initial state. Here, we hypothesise the surviving subnetworks are more important and bridge the initial feature and their surviving score as the PaI criterion. We employ an end-to-end neural network (\textbf{AutoS}parse) to learn this correlation, input the model's initial features, output their score and then prune the lowest score parameters before training. To validate the accuracy and generalization of our method, we performed PaI across various models. Results show that our approach outperforms existing methods in high-sparsity settings. Notably, as the underlying logic of model pruning is consistent in different models, only one-time IRP on one model is needed (e.g., once IRP on ResNet-18/CIFAR-10, AutoS can be generalized to VGG-16/CIFAR-10, ResNet-18/TinyImageNet, et al.). As the first neural network-based PaI method, we conduct extensive experiments to validate the factors influencing this approach. These results reveal the learning tendencies of neural networks and provide new insights into our understanding and research of PaI from a practical perspective. Our code is available at: https://github.com/ChengYaofeng/AutoSparse.git.
Paper Structure (44 sections, 5 equations, 6 figures, 9 tables, 2 algorithms)

This paper contains 44 sections, 5 equations, 6 figures, 9 tables, 2 algorithms.

Figures (6)

  • Figure 1: Visualization of Surviving Score. The score is obtained from IRP on LeNet-300-100 of MNIST. (a) The correlation between surviving score and initial parameter. (b) Comparing of surviving score and SNIP lee2018snip and GraSP wang2020picking.
  • Figure 2: (a) illustrates the process of obtaining the surviving score, which also serves as the generation process for the AutoS dataset $\mathcal{D}_{\mathcal{A}}$. $\mathcal{D}_{\mathcal{A}}$ consists of parameters after initialization and gradients obtained based on the pruning dataset. (b) demonstrates how AutoS predicts these scores. Specifically, features from the initial network are concatenated and input into AutoS, which then predicts the parameter scores.
  • Figure 3: Accuracy of different PaI methods prune to various sparsities.
  • Figure 4: Visualization of score prediction. (a) is the prediction results before AutoS training, (b) is the prediction results after AutoS training.
  • Figure 5: Visualization of ablation score prediction. (a) and (b) are the prediction results of Only Grad and Only Param.
  • ...and 1 more figures