CORP: Closed-Form One-shot Representation-Preserving Structured Pruning for Vision Transformers
Boxiang Zhang, Baijian Yang
TL;DR
CORP tackles the challenge of deploying Vision Transformers under strict post-training constraints by introducing a closed-form, one-shot structured pruning method that preserves representations. It reframes pruning as a representation-recovery problem and derives affine and logit compensation that folds into weights without gradients or fine-tuning, enabling pruning of both MLP and attention while using only a small calibration set. The approach achieves strong accuracy preservation on ImageNet across DeiT sizes (e.g., 82.8% Top-1 on DeiT-Huge at 50% sparsity) and delivers real hardware speedups without retraining. By shifting focus from importance ranking to explicit representation compensation, CORP provides a scalable, deployment-friendly solution for compressing Vision Transformers.
Abstract
Vision Transformers achieve strong accuracy but incur high compute and memory cost. Structured pruning can reduce inference cost, but most methods rely on retraining or multi-stage optimization. These requirements limit post-training deployment. We propose \textbf{CORP}, a closed-form one-shot structured pruning framework for Vision Transformers. CORP removes entire MLP hidden dimensions and attention substructures without labels, gradients, or fine-tuning. It operates under strict post-training constraints using only a small unlabeled calibration set. CORP formulates structured pruning as a representation recovery problem. It models removed activations and attention logits as affine functions of retained components and derives closed-form ridge regression solutions that fold compensation into model weights. This minimizes expected representation error under the calibration distribution. Experiments on ImageNet with DeiT models show strong redundancy in MLP and attention representations. Without compensation, one-shot structured pruning causes severe accuracy degradation. With CORP, models preserve accuracy under aggressive sparsity. On DeiT-Huge, CORP retains 82.8\% Top-1 accuracy after pruning 50\% of both MLP and attention structures. CORP completes pruning in under 20 minutes on a single GPU and delivers substantial real-world efficiency gains.
