Table of Contents
Fetching ...

LoLDU: Low-Rank Adaptation via Lower-Diag-Upper Decomposition for Parameter-Efficient Fine-Tuning

Yiming Shi, Jiwei Wei, Yujia Wu, Ran Ran, Chengwei Sun, Shiyuan He, Yang Yang

TL;DR

Low-rank LDU (LoLDU) is proposed, a parameter-efficient fine-tuning (PEFT) approach that significantly reduces trainable parameters by 2600 times compared to regular PEFT methods while maintaining comparable performance.

Abstract

The rapid growth of model scale has necessitated substantial computational resources for fine-tuning. Existing approach such as Low-Rank Adaptation (LoRA) has sought to address the problem of handling the large updated parameters in full fine-tuning. However, LoRA utilize random initialization and optimization of low-rank matrices to approximate updated weights, which can result in suboptimal convergence and an accuracy gap compared to full fine-tuning. To address these issues, we propose LoLDU, a Parameter-Efficient Fine-Tuning (PEFT) approach that significantly reduces trainable parameters by 2600 times compared to regular PEFT methods while maintaining comparable performance. LoLDU leverages Lower-Diag-Upper Decomposition (LDU) to initialize low-rank matrices for faster convergence and orthogonality. We focus on optimizing the diagonal matrix for scaling transformations. To the best of our knowledge, LoLDU has the fewest parameters among all PEFT approaches. We conducted extensive experiments across 4 instruction-following datasets, 6 natural language understanding (NLU) datasets, 8 image classification datasets, and image generation datasets with multiple model types (LLaMA2, RoBERTa, ViT, and Stable Diffusion), providing a comprehensive and detailed analysis. Our open-source code can be accessed at \href{https://github.com/SKDDJ/LoLDU}{https://github.com/SKDDJ/LoLDU}.

LoLDU: Low-Rank Adaptation via Lower-Diag-Upper Decomposition for Parameter-Efficient Fine-Tuning

TL;DR

Low-rank LDU (LoLDU) is proposed, a parameter-efficient fine-tuning (PEFT) approach that significantly reduces trainable parameters by 2600 times compared to regular PEFT methods while maintaining comparable performance.

Abstract

The rapid growth of model scale has necessitated substantial computational resources for fine-tuning. Existing approach such as Low-Rank Adaptation (LoRA) has sought to address the problem of handling the large updated parameters in full fine-tuning. However, LoRA utilize random initialization and optimization of low-rank matrices to approximate updated weights, which can result in suboptimal convergence and an accuracy gap compared to full fine-tuning. To address these issues, we propose LoLDU, a Parameter-Efficient Fine-Tuning (PEFT) approach that significantly reduces trainable parameters by 2600 times compared to regular PEFT methods while maintaining comparable performance. LoLDU leverages Lower-Diag-Upper Decomposition (LDU) to initialize low-rank matrices for faster convergence and orthogonality. We focus on optimizing the diagonal matrix for scaling transformations. To the best of our knowledge, LoLDU has the fewest parameters among all PEFT approaches. We conducted extensive experiments across 4 instruction-following datasets, 6 natural language understanding (NLU) datasets, 8 image classification datasets, and image generation datasets with multiple model types (LLaMA2, RoBERTa, ViT, and Stable Diffusion), providing a comprehensive and detailed analysis. Our open-source code can be accessed at \href{https://github.com/SKDDJ/LoLDU}{https://github.com/SKDDJ/LoLDU}.

Paper Structure

This paper contains 47 sections, 12 equations, 7 figures, 12 tables, 1 algorithm.

Figures (7)

  • Figure 1: Performance vs log-scaled trainable parameters for FGVC (left) and StanfordCars (right) on ViT Base. Our LoLDU methods with $r=\{1,8,16,32,64,128,256,512,768\}$ exhibit superior parameter efficiency and performance when contrasted with Linear Probingchen2021empiricalstudytrainingselfsupervised (LP, fine tuning the classifier head only), FourierFTgaoParameterEfficientFineTuningDiscrete2024 ($n=\{3000,10000\}$), LoRAhuLoRALowRankAdaptation2021 ($r=16$), and Full Fine-Tuning. LoLDU $_{r=768}$ outperforms LoRA$_{r=16}$ with 96.837% fewer trainable parameters. Particularly noteworthy is that LoLDU with $r=1$ achieves competitive scores with just 24 trainable parameters, while LoLDU with $r=768$ attains the highest accuracy: 42.15% for FGVC and 66.66% for StanfordCars, showcasing the scalability and effectiveness of our approach. Full Fine-Tuning (85.8M parameters) and Linear Probing represent the upper and lower performance bounds, respectively.
  • Figure 2: Comparison of LoRA (left) and our LoLDU (right) method. In LoRA, tunable parameters are low-rank ($r$) matrices $A$ and $B$, with $\Delta W = BA$. For each weight $W$, there are $r \times (d_{in} + d_{out})$ trainable parameters. LoLDU, however, optimizes a diagonal matrix for scale transformation, preserving original model knowledge during tuning. The weight update in LoLDU is $\Delta W = \sigma \cdot P \cdot (L_r, \text{diag}(z_r), U_r)$, involving $r+1$ trainable parameters. The permutation matrix $P$, while omitted in this figure for simplicity, is included in Figure \ref{['fig:model']}
  • Figure 3: Schematic representation of our LoLDU method. The left diagram illustrates the forward pass, demonstrating the transformation of the input $x \in \mathbb{R}^{d_{in}}$ into the output $h \in \mathbb{R}^{d_{out}}$ via a residual subspace matrix $L_{[r:]}D_{[r:]}U_{[r:]}$ and a decomposed subspace matrix $\sigma L_rD_rU_r$. The right diagram shows the initialization process, where the residual matrix is obtained by performing LDU decomposition on the pre-trained weights, then subtracting the top-$r$ submatrices (top-$r$ rows and columns) from the permutation matrix (P), lower triangular (L), scaled diagonal (D), and upper triangular (U) matrices. Diagonal matrix is trainable (orange), while the other matrices remain fixed (blue). LoLDU enables efficient adaptation of pre-trained models via low-rank updates, reducing both computational cost and parameter count.
  • Figure 4: Comprehensive Analysis of Rank Ablation Study Results. This figure presents the performance of the ViT-base model on various image classification tasks using the LoLDU method with different ranks. The x-axis shows ranks (1 to 768), and the y-axis indicates accuracy for datasets: FGVC, StanfordCars, CIFAR10, CIFAR100, EuroSAT, and Flowers.
  • Figure 5: Concept Learning Progression In Text-to-Image Generation. Top row: target concept. Subsequent rows: generated images using LoLDU (our method), DreamBoothruiz2023dreamboothfinetuningtexttoimage, and Textual Inversiongal2022imageworthwordpersonalizing, respectively, at training steps 0-600. LoLDU exhibits accelerated convergence, achieving concept acquisition within $\sim$ 100 steps, surpassing baseline methods in efficiency.
  • ...and 2 more figures