Globally optimized SVD compression of LLMs via Fermi-function-based rank selection and gauge fixing
Roman Rausch, David Jansen, Sukhbinder Singh, Román Orús
TL;DR
The paper targets memory-efficient deployment of large language models by improving data-aware SVD compression. It introduces FermiGrad, a gradient-based method that globally optimizes per-layer SVD ranks by soft-truncating singular values with a Fermi function, and PivGa, a lossless secondary compression leveraging gauge freedom via Interpolative Decomposition. Together, these methods achieve better accuracy at fixed model size than uniform rank reductions, with practical trade-offs between speed and compression. The techniques offer a principled, physics-inspired route to high-quality, compact LLMs suitable for edge and resource-constrained settings.
Abstract
Large Language Models (LLMs) are very demanding in terms of their computational resources. Low-rank decompositions of LLM weights, e.g. via Singular Value Decomposition (SVD), is a promising approach for LLM compression, but presents several practical hurdles, e.g. selecting appropriate layer-wise ranks and getting rid of its parameter redundancy. In this work, we present two physics-inspired improvements to SVD LLM compression: (1) \textbf{FermiGrad}, a gradient-descent algorithm that determines globally optimal layer-wise ranks by relaxing the discrete singular-value truncation into a continuous optimization using the Fermi function; (2) \textbf{PivGa}, an additional \textit{lossless} compression of the low-rank factors that exploits the intrinsic gauge freedom in their parametrization.
