Dobi-SVD: Differentiable SVD for LLM Compression and Some New Perspectives
Qinsi Wang, Jinghan Ke, Masayoshi Tomizuka, Yiran Chen, Kurt Keutzer, Chenfeng Xu
TL;DR
Dobi-SVD introduces a principled, differentiable SVD-based compression framework for LLMs that prioritizes activation truncation over weight truncation, paired with an IPCA-based weight update and a novel remapped storage scheme. By smoothing truncation to a learnable position, applying Incremental PCA to derive the optimal rank-k weight, and remapping storage to achieve a bijective compression ratio, the method overcomes longstanding SVD limitations and achieves strong task performance with minimal degradation at high compression. Empirical results on LLaMA-family models show state-of-the-art SVD-based compression, substantial hardware speedups, and compatibility with quantization, while extension to vision-language and vision-language-action models demonstrates generality. The work has practical implications for deploying large models on resource-constrained hardware, edge devices, and robotics, where memory and compute efficiency are critical.
Abstract
We provide a new LLM-compression solution via SVD, unlocking new possibilities for LLM compression beyond quantization and pruning. We point out that the optimal use of SVD lies in truncating activations, rather than merely using activations as an optimization distance. Building on this principle, we address three critical challenges in SVD-based LLM compression: including (1) How can we determine the optimal activation truncation position for each weight matrix in LLMs? (2) How can we efficiently reconstruct the weight matrices based on truncated activations? (3) How can we address the inherent "injection" nature that results in the information loss of the SVD? We propose Dobi-SVD, which establishes a new, principled approach to SVD-based LLM compression.
