Subspace Node Pruning
Joshua Offergeld, Marcel van Gerven, Nasir Ahmad
TL;DR
This work tackles the challenge of reducing neural network inference cost without sacrificing accuracy. It introduces Subspace Node Pruning (SNP), which orthogonalizes layer activations into a lower-triangular subspace and uses linear least squares to reconstruct the impact of pruned units, while determining pruning ratios from cumulative variance. A novel ordering via unnormalized-ZCA captures unit redundancy, and LDL-based subspace transforms enable automatic, globally coordinated pruning across layers. The method achieves state-of-the-art or competitive results on ImageNet models (VGG-16, ResNet-50, DeiT) and demonstrates effective one-shot pruning on OPT, all with substantially reduced compute and without heavy per-layer tuning. Overall, SNP provides a simple, interpretable, and scalable framework for efficient pruning across CNNs and transformers, with broad applicability and potential for further refinements during training or dynamic pruning.
Abstract
Improving the efficiency of neural network inference is undeniably important in a time where commercial use of AI models increases daily. Node pruning is the art of removing computational units such as neurons, filters, attention heads, or even entire layers to significantly reduce inference time while retaining network performance. In this work, we propose the projection of unit activations to an orthogonal subspace in which there is no redundant activity and within which we may prune nodes while simultaneously recovering the impact of lost units via linear least squares. We furthermore show that the order in which units are orthogonalized can be optimized to maximally rank units by their redundancy. Finally, we leverage these orthogonal subspaces to automatically determine layer-wise pruning ratios based upon the relative scale of node activations in our subspace, equivalent to cumulative variance. Our method matches or exceeds state-of-the-art pruning results on ImageNet-trained VGG-16, ResNet-50 and DeiT models while simultaneously having up to 24x lower computational cost than alternative methods. We also demonstrate that this method can be applied in a one-shot manner to OPT LLM models, again outperforming competing methods.
