Table of Contents
Fetching ...

Optimizing Singular Spectrum for Large Language Model Compression

Dengjie Li, Tiancheng Shen, Yao Zhou, Baisong Yang, Zhongying Liu, Masheng Yang, Bernard Ghanem, Yibo Yang, Yujie Zhong, Ming-Hsuan Yang

TL;DR

This work tackles the challenge of compressing large language models by questioning the effectiveness of using descending singular-value order for component importance. It introduces SoCo, which inserts a learnable diagonal matrix $S$ to reweight the singular spectrum in $W' = U \ (\Sigma \odot S)\ V^\top$ and employs a three-stage optimization (Stage 1 rapid compression, Stage 2 alternating refinement, Stage 3 sparsity) with a deviation term to preserve performance. The method yields a data-driven, end-to-end re-evaluation of component significance, outperforming existing SVD-based methods and pruning across multiple LLMs and benchmarks. Practically, SoCo enables aggressive, task-aware compression with minimal accuracy loss, supporting efficient deployment in resource-constrained environments.

Abstract

Large language models (LLMs) have demonstrated remarkable capabilities, yet prohibitive parameter complexity often hinders their deployment. Existing singular value decomposition (SVD) based compression methods simply deem singular values as importance scores of decomposed components. However, this importance ordered by singular values does not necessarily correlate with the performance of a downstream task. In this work, we introduce SoCo (Singular spectrum optimization for large language model Compression), a novel compression framework that learns to rescale the decomposed components of SVD in a data-driven manner. Concretely, we employ a learnable diagonal matrix to assign importance scores for singular spectrum and develop a three-stage training process that progressively refines these scores from initial coarse compression to fine-grained sparsification-thereby striking an effective balance between aggressive model compression and performance preservation. Thanks to the learnable singular spectrum, SoCo adaptively prunes components according to the sparsified importance scores, rather than relying on the fixed order of singular values. More importantly, the remaining components with amplified importance scores can compensate for the loss of the pruned ones. Experimental evaluations across multiple LLMs and benchmarks demonstrate that SoCo surpasses the state-of-the-art methods in model compression.

Optimizing Singular Spectrum for Large Language Model Compression

TL;DR

This work tackles the challenge of compressing large language models by questioning the effectiveness of using descending singular-value order for component importance. It introduces SoCo, which inserts a learnable diagonal matrix to reweight the singular spectrum in and employs a three-stage optimization (Stage 1 rapid compression, Stage 2 alternating refinement, Stage 3 sparsity) with a deviation term to preserve performance. The method yields a data-driven, end-to-end re-evaluation of component significance, outperforming existing SVD-based methods and pruning across multiple LLMs and benchmarks. Practically, SoCo enables aggressive, task-aware compression with minimal accuracy loss, supporting efficient deployment in resource-constrained environments.

Abstract

Large language models (LLMs) have demonstrated remarkable capabilities, yet prohibitive parameter complexity often hinders their deployment. Existing singular value decomposition (SVD) based compression methods simply deem singular values as importance scores of decomposed components. However, this importance ordered by singular values does not necessarily correlate with the performance of a downstream task. In this work, we introduce SoCo (Singular spectrum optimization for large language model Compression), a novel compression framework that learns to rescale the decomposed components of SVD in a data-driven manner. Concretely, we employ a learnable diagonal matrix to assign importance scores for singular spectrum and develop a three-stage training process that progressively refines these scores from initial coarse compression to fine-grained sparsification-thereby striking an effective balance between aggressive model compression and performance preservation. Thanks to the learnable singular spectrum, SoCo adaptively prunes components according to the sparsified importance scores, rather than relying on the fixed order of singular values. More importantly, the remaining components with amplified importance scores can compensate for the loss of the pruned ones. Experimental evaluations across multiple LLMs and benchmarks demonstrate that SoCo surpasses the state-of-the-art methods in model compression.

Paper Structure

This paper contains 24 sections, 11 equations, 8 figures, 10 tables, 1 algorithm.

Figures (8)

  • Figure 1: SoCo consistently outperforms existing methods across a range of compression ratios, yielding lower perplexity on C4 (left) and higher average classification accuracy on LM-Evaluation-Harness (right).
  • Figure 2: Illustration of the overall framework of SoCo. The pre-trained weight matrix $W$ is decomposed into $U \Sigma V^\top$, where $\Sigma$ is a diagonal matrix with diagonal elements arranged in descending order. a) Existing SVD-based compression methods truncate the smaller singular values and their corresponding vectors in $U$ and rows in $V^\top$. b) The proposed SoCo assigns a diagonal matrix $S$ as importance scores to singular values in $\Sigma$. After training, singular values with an importance score below a given threshold (e.g., 0.5) are pruned. In particular, singular values with importance scores larger than the threshold rescale the preserved singular values to compensate the loss from pruned components.
  • Figure 3: Distribution of importance scores after each of our three training stages, illustrating the dynamic re-evaluation process.
  • Figure 4: Preservation ratio of components in the original SVD-based importance order across the three stages.
  • Figure 5: Function graph of the importance scores $S$. The orange function denotes the $S$ used in SoCo, while the blue function represents the standard sigmoid function.
  • ...and 3 more figures