COMPOT: Calibration-Optimized Matrix Procrustes Orthogonalization for Transformers Compression
Denis Makhov, Dmitriy Shopkhoev, Magauiya Zhussip, Ammar Ali, Baher Mohammad, Stamatios Lefkimmiatis
TL;DR
COMPOT addresses the challenge of post-training Transformer projection compression by using calibration data to estimate a sparse factorization in a whitened space. It enforces an orthogonal dictionary with $D_O^T D_O = I_k$ and a sparse code with $\|s_O_j\|_0 \le s$, reconstructing $\widehat{W} = A S_O$ where $A = L^{-T} D_O$ and $G = X^T X = L L^T$. A one-shot global allocation pools normalized singular values across matrices to determine per-matrix ranks under a model-wide budget. The method yields closed-form dictionary updates via Procrustes and analytic sparse coding, avoiding iterative pursuits, and shows strong improvements over SVD-based and dictionary-learning baselines while remaining compatible with post-training quantization. Across language, vision-language, and audio tasks, COMPOT delivers substantial quality gains at comparable memory budgets, demonstrating practical impact for efficient deployment of large transformers.
Abstract
Post-training compression of Transformer models commonly relies on truncated singular value decomposition (SVD). However, enforcing a single shared subspace can degrade accuracy even at moderate compression. Sparse dictionary learning provides a more flexible union-of-subspaces representation, but existing approaches often suffer from iterative dictionary and coefficient updates. We propose COMPOT (Calibration-Optimized Matrix Procrustes Orthogonalization for Transformers), a training-free compression framework that uses a small calibration dataset to estimate a sparse weight factorization. COMPOT employs orthogonal dictionaries that enable closed-form Procrustes updates for the dictionary and analytical single-step sparse coding for the coefficients, eliminating iterative optimization. To handle heterogeneous layer sensitivity under a global compression budget, COMPOT further introduces a one-shot dynamic allocation strategy that adaptively redistributes layer-wise compression rates. Extensive experiments across diverse architectures and tasks show that COMPOT consistently delivers a superior quality-compression trade-off over strong low-rank and sparse baselines, while remaining fully compatible with post-training quantization for extreme compression. Code is available $\href{https://github.com/mts-ai/COMPOT}{here}$.
