RoLoRA: Fine-tuning Rotated Outlier-free LLMs for Effective Weight-Activation Quantization

Xijie Huang; Zechun Liu; Shih-Yang Liu; Kwang-Ting Cheng

RoLoRA: Fine-tuning Rotated Outlier-free LLMs for Effective Weight-Activation Quantization

Xijie Huang, Zechun Liu, Shih-Yang Liu, Kwang-Ting Cheng

TL;DR

Experimental results show RoLoRA consistently improves low-bit LoRA convergence and post-training quantization robustness in weight-activation settings, and this work proposes RoLoRA, the first LoRA-based scheme for effective weight-activation quantization.

Abstract

Low-Rank Adaptation (LoRA), as a representative Parameter-Efficient Fine-Tuning (PEFT)method, significantly enhances the training efficiency by updating only a small portion of the weights in Large Language Models (LLMs). Recently, weight-only quantization techniques have also been applied to LoRA methods to reduce the memory footprint of fine-tuning. However, applying weight-activation quantization to the LoRA pipeline is under-explored, and we observe substantial performance degradation primarily due to the presence of activation outliers. In this work, we propose RoLoRA, the first LoRA-based scheme for effective weight-activation quantization. RoLoRA utilizes rotation for outlier elimination and proposes rotation-aware fine-tuning to preserve the outlier-free characteristics in rotated LLMs. Experimental results show RoLoRA consistently improves low-bit LoRA convergence and post-training quantization robustness in weight-activation settings. We evaluate RoLoRA across LLaMA2-7B/13B, LLaMA3-8B models, achieving up to 29.5% absolute accuracy gain of 4-bit weight-activation quantized LLaMA2- 13B on commonsense reasoning tasks compared to LoRA baseline. We further demonstrate its effectiveness on Large Multimodal Models (LLaVA-1.5-7B). Codes are available at https://github.com/HuangOwen/RoLoRA

RoLoRA: Fine-tuning Rotated Outlier-free LLMs for Effective Weight-Activation Quantization

TL;DR

Abstract

Paper Structure (19 sections, 6 equations, 7 figures, 12 tables)

This paper contains 19 sections, 6 equations, 7 figures, 12 tables.

Introduction
Related Work
Preliminary and Motivation
Low-Rank Adaptation (LoRA)
Outlier in Transformer
Eliminating Outlier with Rotation
Method
Applying Rotation
Rotation-aware Fine-tuning
Experiments
Settings
Main Results
Visual Instruction Tuning
Compatibility with other LoRA variants
Ablation Study and Analysis
...and 4 more sections

Figures (7)

Figure 1: Activation distribution before and after rotation. The visualized input activations are selected from layers.1.self_attn.q_proj in LLaMA2-7B.
Figure 2: Overview of the proposed Rotated outlier-free LoRA (RoLoRA)
Figure 3: Two schemes for performing rotation-aware fine-tuning: (a) LAR and (b) LBR.
Figure 4: SVD approximation error of optimization targets with different LoRA-rotation integration schemes.
Figure 5: Left: The training dynamics of the average Kurtosis of activations, Middle: The distribution of Kurtosis of activations across all layers in the final model after fine-tuning with LoRA and RoLoRA, Right: The accumulative quantization error of W4A4 GPTQ across all layers in the final model after fine-tuning with LoRA and RoLoRA.
...and 2 more figures

RoLoRA: Fine-tuning Rotated Outlier-free LLMs for Effective Weight-Activation Quantization

TL;DR

Abstract

RoLoRA: Fine-tuning Rotated Outlier-free LLMs for Effective Weight-Activation Quantization

Authors

TL;DR

Abstract

Table of Contents

Figures (7)