Table of Contents
Fetching ...

CE-LoRA: Computation-Efficient LoRA Fine-Tuning for Language Models

Guanduo Chen, Yutong He, Yipeng Hu, Kun Yuan, Binhang Yuan

TL;DR

CE-LoRA tackles the high compute cost of fine-tuning large language models by targeting the activation-gradient backward pass. It introduces Approximated Matrix Multiplication (AMM) and Double-LoRA to cut compute while preserving LoRA's memory benefits, and adds layer-wise adaptive sparsity to balance accuracy and efficiency. Theoretical analysis shows convergence at $O(1/\sqrt{T})$ under momentum SGD, and empirical results demonstrate up to $36.3\%$ end-to-end speedup with minimal accuracy loss on reasoning benchmarks. This work provides a practical, scalable approach for computation-efficient fine-tuning of large transformer models.

Abstract

Large Language Models (LLMs) demonstrate exceptional performance across various tasks but demand substantial computational resources even for fine-tuning computation. Although Low-Rank Adaptation (LoRA) significantly alleviates memory consumption during fine-tuning, its impact on computational cost reduction is limited. This paper identifies the computation of activation gradients as the primary bottleneck in LoRA's backward propagation and introduces the Computation-Efficient LoRA (CE-LoRA) algorithm, which enhances computational efficiency while preserving memory efficiency. CE-LoRA leverages two key techniques: Approximated Matrix Multiplication, which replaces dense multiplications of large and complete matrices with sparse multiplications involving only critical rows and columns, and the Double-LoRA technique, which reduces error propagation in activation gradients. Theoretically, CE-LoRA converges at the same rate as LoRA, $ \mathcal{O}(1/\sqrt{T}) $, where $T$ is the number of iteartions. Empirical evaluations confirm that CE-LoRA significantly reduces computational costs compared to LoRA without notable performance degradation.

CE-LoRA: Computation-Efficient LoRA Fine-Tuning for Language Models

TL;DR

CE-LoRA tackles the high compute cost of fine-tuning large language models by targeting the activation-gradient backward pass. It introduces Approximated Matrix Multiplication (AMM) and Double-LoRA to cut compute while preserving LoRA's memory benefits, and adds layer-wise adaptive sparsity to balance accuracy and efficiency. Theoretical analysis shows convergence at under momentum SGD, and empirical results demonstrate up to end-to-end speedup with minimal accuracy loss on reasoning benchmarks. This work provides a practical, scalable approach for computation-efficient fine-tuning of large transformer models.

Abstract

Large Language Models (LLMs) demonstrate exceptional performance across various tasks but demand substantial computational resources even for fine-tuning computation. Although Low-Rank Adaptation (LoRA) significantly alleviates memory consumption during fine-tuning, its impact on computational cost reduction is limited. This paper identifies the computation of activation gradients as the primary bottleneck in LoRA's backward propagation and introduces the Computation-Efficient LoRA (CE-LoRA) algorithm, which enhances computational efficiency while preserving memory efficiency. CE-LoRA leverages two key techniques: Approximated Matrix Multiplication, which replaces dense multiplications of large and complete matrices with sparse multiplications involving only critical rows and columns, and the Double-LoRA technique, which reduces error propagation in activation gradients. Theoretically, CE-LoRA converges at the same rate as LoRA, , where is the number of iteartions. Empirical evaluations confirm that CE-LoRA significantly reduces computational costs compared to LoRA without notable performance degradation.

Paper Structure

This paper contains 18 sections, 4 theorems, 26 equations, 6 figures, 3 tables, 1 algorithm.

Key Result

Theorem 4.5

Under Assumptions asp:proper - asp:contractive, if $\beta_1\in\left(0,\frac{\delta}{24-12\delta}\right)$ and $\eta\le\min\left\{\frac{L}{2}, \frac{\beta_1}{L}\cdot\sqrt{\frac{\delta}{8}}\right\}$, CE-LoRA with momentum SGD converges as

Figures (6)

  • Figure 1: An illustration of the Approximated Matrix Multiplication (AMM) technique (left) and the CE-LoRA framework (right).
  • Figure 2: Layer-wise Sensitivity Analysis of LLaMA3.2-1B.
  • Figure 3: Empirical validation of \ref{['eq:asp-cgk']} on MRPC (left), RTE (middle) and CoLA (right).
  • Figure 4: Empirical validation of \ref{['eq:asp-ecgk']} on MRPC (left), RTE (middle) and CoLA (right).
  • Figure 5: Loss curve of commonsense reasoning fine-tune task. Each row in the figure corresponds to a different trainable parameter setting, while each column represents base models: LLaMA2-7B/13B and LLaMA3.1-8B.
  • ...and 1 more figures

Theorems & Definitions (6)

  • Theorem 4.5
  • Corollary 4.6
  • Lemma A.1
  • proof
  • Theorem A.2
  • proof