$\rm SP^3$: Enhancing Structured Pruning via PCA Projection
Yuxuan Hu, Jing Zhang, Zhe Zhao, Chen Zhao, Xiaodong Chen, Cuiping Li, Hong Chen
TL;DR
The paper addresses the underexplored potential of pruning the transformer hidden dimension $d$ in pre-trained language models. It introduces SP$^3$, which projects features into a PCA-defined subspace before masking, and adds residual linear transformations to allow layer-specific hidden-dimension pruning; this combination yields strong compression with minimal accuracy loss. Empirical results on GLUE and SQuAD show SP$^3$ achieving about $70\%$ hidden-dimension reduction and $94\%$ overall compression of $BERT_{base}$ while maintaining $\geq 96\%$ of performance, outperforming prior methods by up to ~6 percentage points in accuracy at the same compression. The approach also extends to OPT and Llama, and the authors discuss practical considerations, limitations, and potential improvements such as Group PCA Projection for large language models.
Abstract
Structured pruning is a widely used technique for reducing the size of pre-trained language models (PLMs), but current methods often overlook the potential of compressing the hidden dimension (d) in PLMs, a dimension critical to model size and efficiency. This paper introduces a novel structured pruning approach, Structured Pruning with PCA Projection (SP3), targeting the effective reduction of d by projecting features into a space defined by principal components before masking. Extensive experiments on benchmarks (GLUE and SQuAD) show that SP3 can reduce d by 70%, compress 94% of the BERTbase model, maintain over 96% accuracy, and outperform other methods that compress d by 6% in accuracy at the same compression ratio. SP3 has also proven effective with other models, including OPT and Llama. Our data and code are available at an anonymous repo.
