UniPTS: A Unified Framework for Proficient Post-Training Sparsity
Jingjing Xie, Yuxin Zhang, Mingbao Lin, Zhihang Lin, Liujuan Cao, Rongrong Ji
TL;DR
UniPTS addresses the challenge of maintaining high accuracy under Post-Training Sparsity with limited calibration data by jointly optimizing three facets: a base-decayed KL loss to transfer knowledge from a dense model, an evolutionary search for globally effective sparsity distributions with a reducing-regrowing mechanism to curb overfitting, and dynamic sparsity training to explore sparse structures stably. The base-decayed objective, defined by $\mathcal{L}_{DKL}$, adapts the gradient scale over training to sustain guidance from dense to sparse networks; sparsity distribution is determined via an evolutionary process that first reduces the search space to $P_e > P$, then regrows to meet the global sparsity $P$, with fitness regularized by BN statistics and input noise. Empirically, UniPTS markedly outperforms POT across datasets and architectures, notably improving ResNet-50 on ImageNet at 90% sparsity (e.g., achieving up to 68.6% top-1 accuracy versus POT’s significantly lower performance) and delivering substantial detection gains in Faster-RCNN and SSD at high sparsity. The findings highlight the value of integrating global objective alignment, robust sparsity distribution search, and dynamic sparse training to enable practical, data-efficient run-time sparsification. Overall, UniPTS offers a principled, scalable framework for proficient PTS with broad applicability to vision tasks and structured sparsity patterns.
Abstract
Post-training Sparsity (PTS) is a recently emerged avenue that chases efficient network sparsity with limited data in need. Existing PTS methods, however, undergo significant performance degradation compared with traditional methods that retrain the sparse networks via the whole dataset, especially at high sparsity ratios. In this paper, we attempt to reconcile this disparity by transposing three cardinal factors that profoundly alter the performance of conventional sparsity into the context of PTS. Our endeavors particularly comprise (1) A base-decayed sparsity objective that promotes efficient knowledge transferring from dense network to the sparse counterpart. (2) A reducing-regrowing search algorithm designed to ascertain the optimal sparsity distribution while circumventing overfitting to the small calibration set in PTS. (3) The employment of dynamic sparse training predicated on the preceding aspects, aimed at comprehensively optimizing the sparsity structure while ensuring training stability. Our proposed framework, termed UniPTS, is validated to be much superior to existing PTS methods across extensive benchmarks. As an illustration, it amplifies the performance of POT, a recently proposed recipe, from 3.9% to 68.6% when pruning ResNet-50 at 90% sparsity ratio on ImageNet. We release the code of our paper at https://github.com/xjjxmu/UniPTS.
