The Neural Pruning Law Hypothesis
Eugen Barbulescu, Antonio Alexoaie, Lucian Busoniu
TL;DR
The Neural Pruning Law Hypothesis addresses how to uniformly characterize pruning by proposing a principled flux-based mechanism (Hyperflux) that combines weight flux with a global pressure to reveal weight importance. It introduces an $L_0$ pruning framework and demonstrates a density-flux power-law relation, $\,\ln(s)=\ln(c)-\alpha_0\,\ln(\gamma)$, which the authors argue should hold across salient pruning metrics. Empirically, Hyperflux achieves competitive or superior sparsity-accuracy tradeoffs on CIFAR-10/100 and ImageNet-1K across magnitude, gradient, and $L_0$ pruning families, suggesting a unifying property of neural pruning. The work lays a foundation for principled sparse subnetwork discovery with potential impact on deploying efficient models on resource-constrained devices and informs future research in broader domains such as NLP and reinforcement learning.
Abstract
Network pruning is used to reduce inference latency and power consumption in large neural networks. However, most current pruning methods rely on ad-hoc heuristics that are poorly understood. We introduce Hyperflux, a conceptually-grounded pruning method, and use it to study the pruning process. Hyperflux models this process as an interaction between weight flux, the gradient's response to the weight's removal, and network pressure, a global regularization driving weights towards pruning. We postulate properties that arise naturally from our framework and find that the relationship between minimum flux among weights and density follows a power-law equation. Furthermore, we hypothesize the power-law relationship to hold for any effective saliency metric and call this idea the Neural Pruning Law Hypothesis. We validate our hypothesis on several families of pruning methods (magnitude, gradients, $L_0$), providing a potentially unifying property for neural pruning.
