How Sparse Can We Prune A Deep Network: A Fundamental Limit Perspective
Qiaozhe Zhang, Ruijie Zhang, Jun Sun, Yingzhuang Liu
TL;DR
Addressing the fundamental limit of pruning in deep networks, the paper formulates pruning as a sparsity-constrained loss feasibility problem using the loss sublevel set $S(\epsilon)$ and analyzes it with convex-geometry tools (Gaussian width, statistical dimension) and the Approximate Kinematics Formula. It derives computable lower and upper bounds on the pruning ratio, showing a sharp fundamental limit that depends on weight magnitude (\|\mathbf{w}^*-\mathbf{w}^k\|) and network sharpness (trace of the Hessian $\mathrm{Tr}(H)$). An $l_1$-regularization based one-shot magnitude pruning (LOMP) scheme is proposed and paired with improved Hessian-spectrum estimation to approach the limit, with experiments across CIFAR/TinyImageNet-ResNet/Alex/VGG showing close agreement between theory and practice. The results also offer rigorous interpretations of existing pruning heuristics (e.g., gradual pruning, the role of $l_2$ regularization) and provide practical guidance for achieving near-optimal pruning without significant accuracy loss.
Abstract
Network pruning is a commonly used measure to alleviate the storage and computational burden of deep neural networks. However, the fundamental limit of network pruning is still lacking. To close the gap, in this work we'll take a first-principles approach, i.e. we'll directly impose the sparsity constraint on the loss function and leverage the framework of statistical dimension in convex geometry, thus enabling us to characterize the sharp phase transition point, which can be regarded as the fundamental limit of the pruning ratio. Through this limit, we're able to identify two key factors that determine the pruning ratio limit, namely, weight magnitude and network sharpness. Generally speaking, the flatter the loss landscape or the smaller the weight magnitude, the smaller pruning ratio. Moreover, we provide efficient countermeasures to address the challenges in the computation of the pruning limit, which mainly involves the accurate spectrum estimation of a large-scale and non-positive Hessian matrix. Moreover, through the lens of the pruning ratio threshold, we can also provide rigorous interpretations on several heuristics in existing pruning algorithms. Extensive experiments are performed which demonstrate that our theoretical pruning ratio threshold coincides very well with the experiments. All codes are available at: https://github.com/QiaozheZhang/Global-One-shot-Pruning
