Accurate LoRA-Finetuning Quantization of LLMs via Information Retention

Haotong Qin; Xudong Ma; Xingyu Zheng; Xiaoyang Li; Yang Zhang; Shouda Liu; Jie Luo; Xianglong Liu; Michele Magno

Accurate LoRA-Finetuning Quantization of LLMs via Information Retention

Haotong Qin, Xudong Ma, Xingyu Zheng, Xiaoyang Li, Yang Zhang, Shouda Liu, Jie Luo, Xianglong Liu, Michele Magno

TL;DR

IR-QLoRA tackles the accuracy drop in LoRA-finetuned quantized LLMs by introducing two information-centric components: Information Calibration Quantization (ICQ) and Information Elastic Connection (IEC). ICQ preserves information during low-bit quantization by optimizing a calibration offset and scaling to maximize entropy, thereby reducing information loss. IEC augments LoRA with parameter-free, elastic transformations to better reuse the original representation, enabling LoRA to access richer information. Across LLaMA and LLaMA2 models from $7$B to $65$B, IR-QLoRA yields consistent accuracy gains at $2$-$4$ bit widths, with modest overhead and broad compatibility with both NormalFloat and integer quantizers, advancing practical deployment of quantized, LoRA-finetuned LLMs.

Abstract

The LoRA-finetuning quantization of LLMs has been extensively studied to obtain accurate yet compact LLMs for deployment on resource-constrained hardware. However, existing methods cause the quantized LLM to severely degrade and even fail to benefit from the finetuning of LoRA. This paper proposes a novel IR-QLoRA for pushing quantized LLMs with LoRA to be highly accurate through information retention. The proposed IR-QLoRA mainly relies on two technologies derived from the perspective of unified information: (1) statistics-based Information Calibration Quantization allows the quantized parameters of LLM to retain original information accurately; (2) finetuning-based Information Elastic Connection makes LoRA utilizes elastic representation transformation with diverse information. Comprehensive experiments show that IR-QLoRA can significantly improve accuracy across LLaMA and LLaMA2 families under 2-4 bit-widths, e.g., 4- bit LLaMA-7B achieves 1.4% improvement on MMLU compared with the state-of-the-art methods. The significant performance gain requires only a tiny 0.31% additional time consumption, revealing the satisfactory efficiency of our IR-QLoRA. We highlight that IR-QLoRA enjoys excellent versatility, compatible with various frameworks (e.g., NormalFloat and Integer quantization) and brings general accuracy gains. The code is available at https://github.com/htqin/ir-qlora.

Accurate LoRA-Finetuning Quantization of LLMs via Information Retention

TL;DR

B to

B, IR-QLoRA yields consistent accuracy gains at

bit widths, with modest overhead and broad compatibility with both NormalFloat and integer quantizers, advancing practical deployment of quantized, LoRA-finetuned LLMs.

Abstract

Paper Structure (27 sections, 17 equations, 5 figures, 16 tables, 2 algorithms)

This paper contains 27 sections, 17 equations, 5 figures, 16 tables, 2 algorithms.

Introduction
Related Work
The Rise of IR-QLoRA
Preliminaries
Information Calibration Quantization
Degeneration of Quantized LLMs
Information Calibration Quantization for Representation Recovery
Information Elastic Connection
Limitation of Finetunable LoRA
Information Elastic Connection for Information Enhancement
Experiment
Main Results
Ablation Study
Analysis and Discussion
Conclusion
...and 12 more sections

Figures (5)

Figure 1: Overview of IR-QLoRA. The framework includes Information Calibration Quantization (ICQ) for quantizing LLMs and Information Elastic Connection (IEC) for enhancing LoRA
Figure 2: An illustration of ICQ in IR-QLoRA
Figure 3: An illustration of IEC in IR-QLoRA
Figure 4: Entropy of linear projections in LLaMA-7B
Figure 5: Entropy comparison of linear projections in 4-bit LLaMA-7B

Accurate LoRA-Finetuning Quantization of LLMs via Information Retention

TL;DR

Abstract

Accurate LoRA-Finetuning Quantization of LLMs via Information Retention

Authors

TL;DR

Abstract

Table of Contents

Figures (5)