Table of Contents
Fetching ...

Compensate Quantization Errors: Make Weights Hierarchical to Compensate Each Other

Yifei Gao, Jie Ou, Lei Wang, Yuting Xiao, Zhiyuan Xiang, Ruiting Dai, Jun Cheng

TL;DR

The paper tackles quantizing large language models without sacrificing accuracy by introducing Learnable Singular Value Increment (LSI), a SVD-based, data-free approach that makes singular values learnable and incrementally perturbs weight distributions to better fit discrete quantization targets. By integrating LSI with smoothing techniques such as SmoothQuant, LWC, and LET, the method transforms the quantization difficulty and promotes a hierarchical, self-adjusting weight structure that compensates errors across the network. Empirically, LSI achieves state-of-the-art results in weight-only and weight-activation quantization across OPT and LLaMA families, including ultra-low-bit regimes, and enables fast finetuning on quantized models with minimal data and training. The work demonstrates practical impact by preserving inference speed, maintaining hardware efficiency, and offering a data-free PTQ pathway for deploying large quantized LLMs.

Abstract

Emergent Large Language Models (LLMs) use their extraordinary performance and powerful deduction capacity to discern from traditional language models. However, the expenses of computational resources and storage for these LLMs are stunning, quantization then arises as a trending conversation. To address accuracy decay caused by quantization, two streams of works in post-training quantization methods stand out. One uses other weights to compensate existing quantization error, while the other transfers the quantization difficulty to other parts in the model. Combining both merits, we introduce Learnable Singular value Increment (LSI) as an advanced solution. LSI uses Singular Value Decomposition to extract singular values of the weights and make them learnable to help weights compensate each other conditioned on activation. Incorporating LSI with existing techniques, we achieve state-of-the-art performance in diverse quantization settings, no matter in weight-only, weight-activation or extremely low bit scenarios. By unleashing the potential of LSI, efficient finetuning on quantized model is no longer a prohibitive problem.

Compensate Quantization Errors: Make Weights Hierarchical to Compensate Each Other

TL;DR

The paper tackles quantizing large language models without sacrificing accuracy by introducing Learnable Singular Value Increment (LSI), a SVD-based, data-free approach that makes singular values learnable and incrementally perturbs weight distributions to better fit discrete quantization targets. By integrating LSI with smoothing techniques such as SmoothQuant, LWC, and LET, the method transforms the quantization difficulty and promotes a hierarchical, self-adjusting weight structure that compensates errors across the network. Empirically, LSI achieves state-of-the-art results in weight-only and weight-activation quantization across OPT and LLaMA families, including ultra-low-bit regimes, and enables fast finetuning on quantized models with minimal data and training. The work demonstrates practical impact by preserving inference speed, maintaining hardware efficiency, and offering a data-free PTQ pathway for deploying large quantized LLMs.

Abstract

Emergent Large Language Models (LLMs) use their extraordinary performance and powerful deduction capacity to discern from traditional language models. However, the expenses of computational resources and storage for these LLMs are stunning, quantization then arises as a trending conversation. To address accuracy decay caused by quantization, two streams of works in post-training quantization methods stand out. One uses other weights to compensate existing quantization error, while the other transfers the quantization difficulty to other parts in the model. Combining both merits, we introduce Learnable Singular value Increment (LSI) as an advanced solution. LSI uses Singular Value Decomposition to extract singular values of the weights and make them learnable to help weights compensate each other conditioned on activation. Incorporating LSI with existing techniques, we achieve state-of-the-art performance in diverse quantization settings, no matter in weight-only, weight-activation or extremely low bit scenarios. By unleashing the potential of LSI, efficient finetuning on quantized model is no longer a prohibitive problem.

Paper Structure

This paper contains 22 sections, 8 equations, 2 figures, 11 tables.

Figures (2)

  • Figure 1: Weight Distribution Comparison between original weights and weights trained after LSI.
  • Figure 2: Training details about LSI on OPT-6.7B and OPT-30B in the setting of W4A16g128. The magnitude is not drawn in scale. Since OPT-30B is really sensitive to train epochs, in the OPT-30B part, we make one train epoch containing 24 samples, and we train it from 8 samples to 144 samples at the interval of 8 samples.