Table of Contents
Fetching ...

UniPTS: A Unified Framework for Proficient Post-Training Sparsity

Jingjing Xie, Yuxin Zhang, Mingbao Lin, Zhihang Lin, Liujuan Cao, Rongrong Ji

TL;DR

UniPTS addresses the challenge of maintaining high accuracy under Post-Training Sparsity with limited calibration data by jointly optimizing three facets: a base-decayed KL loss to transfer knowledge from a dense model, an evolutionary search for globally effective sparsity distributions with a reducing-regrowing mechanism to curb overfitting, and dynamic sparsity training to explore sparse structures stably. The base-decayed objective, defined by $\mathcal{L}_{DKL}$, adapts the gradient scale over training to sustain guidance from dense to sparse networks; sparsity distribution is determined via an evolutionary process that first reduces the search space to $P_e > P$, then regrows to meet the global sparsity $P$, with fitness regularized by BN statistics and input noise. Empirically, UniPTS markedly outperforms POT across datasets and architectures, notably improving ResNet-50 on ImageNet at 90% sparsity (e.g., achieving up to 68.6% top-1 accuracy versus POT’s significantly lower performance) and delivering substantial detection gains in Faster-RCNN and SSD at high sparsity. The findings highlight the value of integrating global objective alignment, robust sparsity distribution search, and dynamic sparse training to enable practical, data-efficient run-time sparsification. Overall, UniPTS offers a principled, scalable framework for proficient PTS with broad applicability to vision tasks and structured sparsity patterns.

Abstract

Post-training Sparsity (PTS) is a recently emerged avenue that chases efficient network sparsity with limited data in need. Existing PTS methods, however, undergo significant performance degradation compared with traditional methods that retrain the sparse networks via the whole dataset, especially at high sparsity ratios. In this paper, we attempt to reconcile this disparity by transposing three cardinal factors that profoundly alter the performance of conventional sparsity into the context of PTS. Our endeavors particularly comprise (1) A base-decayed sparsity objective that promotes efficient knowledge transferring from dense network to the sparse counterpart. (2) A reducing-regrowing search algorithm designed to ascertain the optimal sparsity distribution while circumventing overfitting to the small calibration set in PTS. (3) The employment of dynamic sparse training predicated on the preceding aspects, aimed at comprehensively optimizing the sparsity structure while ensuring training stability. Our proposed framework, termed UniPTS, is validated to be much superior to existing PTS methods across extensive benchmarks. As an illustration, it amplifies the performance of POT, a recently proposed recipe, from 3.9% to 68.6% when pruning ResNet-50 at 90% sparsity ratio on ImageNet. We release the code of our paper at https://github.com/xjjxmu/UniPTS.

UniPTS: A Unified Framework for Proficient Post-Training Sparsity

TL;DR

UniPTS addresses the challenge of maintaining high accuracy under Post-Training Sparsity with limited calibration data by jointly optimizing three facets: a base-decayed KL loss to transfer knowledge from a dense model, an evolutionary search for globally effective sparsity distributions with a reducing-regrowing mechanism to curb overfitting, and dynamic sparsity training to explore sparse structures stably. The base-decayed objective, defined by , adapts the gradient scale over training to sustain guidance from dense to sparse networks; sparsity distribution is determined via an evolutionary process that first reduces the search space to , then regrows to meet the global sparsity , with fitness regularized by BN statistics and input noise. Empirically, UniPTS markedly outperforms POT across datasets and architectures, notably improving ResNet-50 on ImageNet at 90% sparsity (e.g., achieving up to 68.6% top-1 accuracy versus POT’s significantly lower performance) and delivering substantial detection gains in Faster-RCNN and SSD at high sparsity. The findings highlight the value of integrating global objective alignment, robust sparsity distribution search, and dynamic sparse training to enable practical, data-efficient run-time sparsification. Overall, UniPTS offers a principled, scalable framework for proficient PTS with broad applicability to vision tasks and structured sparsity patterns.

Abstract

Post-training Sparsity (PTS) is a recently emerged avenue that chases efficient network sparsity with limited data in need. Existing PTS methods, however, undergo significant performance degradation compared with traditional methods that retrain the sparse networks via the whole dataset, especially at high sparsity ratios. In this paper, we attempt to reconcile this disparity by transposing three cardinal factors that profoundly alter the performance of conventional sparsity into the context of PTS. Our endeavors particularly comprise (1) A base-decayed sparsity objective that promotes efficient knowledge transferring from dense network to the sparse counterpart. (2) A reducing-regrowing search algorithm designed to ascertain the optimal sparsity distribution while circumventing overfitting to the small calibration set in PTS. (3) The employment of dynamic sparse training predicated on the preceding aspects, aimed at comprehensively optimizing the sparsity structure while ensuring training stability. Our proposed framework, termed UniPTS, is validated to be much superior to existing PTS methods across extensive benchmarks. As an illustration, it amplifies the performance of POT, a recently proposed recipe, from 3.9% to 68.6% when pruning ResNet-50 at 90% sparsity ratio on ImageNet. We release the code of our paper at https://github.com/xjjxmu/UniPTS.
Paper Structure (16 sections, 6 equations, 4 figures, 7 tables, 1 algorithm)

This paper contains 16 sections, 6 equations, 4 figures, 7 tables, 1 algorithm.

Figures (4)

  • Figure 1: Comparison between POT and our UniPTS framework. Left shows that POT uses predefined sparsity distribution to obtain a fixed sparse structure and retrains the pruned layer with the local sparsity objective. But UniPTS(Right) searches the optimal sparsity distribution and leverages the global sparsity objective and dynamic sparsity training to explore optimal sparse structures.
  • Figure 2: Overview about our method. Left:An overview of our method to search for sparsity distribution. We use evolutionary search to deal with the vast solution space. Right:We provide intuition for the training process. After finding optimal sparsity distribution, we use base-decayed KD loss and dynamic sparse training to retrain the pruned network.
  • Figure 3: Sparsity distribution obtained by different methods.
  • Figure 4: Performance influence of the sparse training strategy.