Basis Sharing: Cross-Layer Parameter Sharing for Large Language Model Compression

Jingcun Wang; Yu-Guang Chen; Ing-Chao Lin; Bing Li; Grace Li Zhang

Basis Sharing: Cross-Layer Parameter Sharing for Large Language Model Compression

Jingcun Wang, Yu-Guang Chen, Ing-Chao Lin, Bing Li, Grace Li Zhang

TL;DR

Comprehensive experiments demonstrate that Basis Sharing outperforms state-of-the-art SVD-based compression approaches and parameter sharing techniques, especially under large compression ratios.

Abstract

Large Language Models (LLMs) have achieved remarkable breakthroughs. However, the huge number of parameters in LLMs require significant amount of memory storage in inference, which prevents their practical deployment in many applications. To reduce memory storage of LLMs, singular value decomposition (SVD) provides a promising solution to approximate weight matrices for compressing LLMs. In this paper, we take a step further to explore parameter sharing across different layers with SVD to achieve more effective compression for LLMs. Specifically, weight matrices in different layers are decomposed and represented as a linear combination of a set of shared basis vectors and unique coefficients. The types of weight matrices and the layer selection for basis sharing are examined when compressing LLMs to maintain the performance. Comprehensive experiments demonstrate that Basis Sharing outperforms state-of-the-art SVD-based compression approaches and parameter sharing techniques, especially under large compression ratios. Code is available at: https://github.com/TUDa-HWAI/Basis_Sharing

Basis Sharing: Cross-Layer Parameter Sharing for Large Language Model Compression

TL;DR

Comprehensive experiments demonstrate that Basis Sharing outperforms state-of-the-art SVD-based compression approaches and parameter sharing techniques, especially under large compression ratios.

Abstract

Paper Structure (29 sections, 4 equations, 9 figures, 8 tables)

This paper contains 29 sections, 4 equations, 9 figures, 8 tables.

Introduction
Related Work
Large Language Model Compression
SVD-based Weight Compression
Parameter Sharing
Methodology
Representing Weight Matrices across Layers with Combinations of Basis Vectors and Coefficients
Selection of Weight Matrices in LLMs for Cross-Layer Parameter Sharing
Selection of Layers for Basis Sharing
Experiments
Settings
Baseline
Models and Datasets.
Implementation details
Results
...and 14 more sections

Figures (9)

Figure 1: (a) Two layers share the same weight matrix in previous work. (b) Two layers share the same basis matrix but have their individual coefficients in our work.
Figure 2: Weight matrices across $n$ layers are concatenated horizontally into a weight matrix, which is processed by SVD. The $j^{th}$ column of the original weight matrix in a layer can be represented as a linear combination of $k$ shared basis vectors and coefficients.
Figure 3: PPL ($\downarrow$) of three different LLMs -- OPT-6.7B, LLaMA 2-7B, and Mistral-7B -- under 20% compression ratio on WikiText-2.
Figure 4: Frobenius loss incurred by basis sharing across any two layers. The number/color in a block represents the resulting Frobenius loss if a basis matrix is shared by two layers and the numbers in the diagonal direction are obtained by applying SVD to the scaled weight matrix of a layer directly. (a) Frobenius loss incurred by basis sharing across two layers for ${\bm{W}}_K$ in LLaMA2-7B. (b) Frobenius loss incurred by basis sharing across two layers for ${\bm{W}}_O$ in LLaMA2-7B.
Figure 5: LoRA fine-tuning results of LLaMA-7B under 20% compression ratio with different compression methods.
...and 4 more figures

Basis Sharing: Cross-Layer Parameter Sharing for Large Language Model Compression

TL;DR

Abstract

Basis Sharing: Cross-Layer Parameter Sharing for Large Language Model Compression

Authors

TL;DR

Abstract

Table of Contents

Figures (9)