Table of Contents
Fetching ...

ID-LoRA: Efficient Low-Rank Adaptation Inspired by Matrix Interpolative Decomposition

Xindian Ma, Rundong Kong, Peng Zhang, Ruoxiang Huang, Yongyu Jiang

TL;DR

In multi-task scenarios, ID-LoRA surpasses LoRA and its recent variants on both Code and MMLU tasks, yet requires only 54% of the trainable parameters demanded by the conventional LoRA.

Abstract

LoRA has become a universal Parameter-Efficient Fine-Tuning (PEFT) technique that equips Large Language Models (LLMs) to adapt quickly to new tasks. However, when these models are scaled up, even the latest LoRA variants still introduce considerable overhead in trainable parameters. Conversely, aggressively lowering the rank to curb this overhead markedly degrades performance in complex multi-task settings. We propose ID-LoRA, a novel PEFT framework that breaks the trade-off. Its core innovation lies in extracting and reusing clustered parameter groups from the pretrained weight matrix. These groups are then used to form multiple low-rank components, all of which share only a single initialized trainable low-rank matrix. This approach cuts the number of trainable parameters while keeping the model's capacity intact. We evaluate ID-LoRA on five diverse benchmarks: Mathematical Reasoning, Code Generation, MMLU, CommonsenseQA, and Safety Alignment. ID-LoRA outperforms both full fine-tuning and existing PEFT baselines (e.g., LoRA, DoRA, HydraLoRA) while using up to 46% fewer trainable parameters than the standard LoRA. In multi-task scenarios, it surpasses LoRA and its recent variants (e.g., DoRA and HydraLoRA) on both Code and MMLU tasks, yet requires only 54% of the trainable parameters demanded by the conventional LoRA.

ID-LoRA: Efficient Low-Rank Adaptation Inspired by Matrix Interpolative Decomposition

TL;DR

In multi-task scenarios, ID-LoRA surpasses LoRA and its recent variants on both Code and MMLU tasks, yet requires only 54% of the trainable parameters demanded by the conventional LoRA.

Abstract

LoRA has become a universal Parameter-Efficient Fine-Tuning (PEFT) technique that equips Large Language Models (LLMs) to adapt quickly to new tasks. However, when these models are scaled up, even the latest LoRA variants still introduce considerable overhead in trainable parameters. Conversely, aggressively lowering the rank to curb this overhead markedly degrades performance in complex multi-task settings. We propose ID-LoRA, a novel PEFT framework that breaks the trade-off. Its core innovation lies in extracting and reusing clustered parameter groups from the pretrained weight matrix. These groups are then used to form multiple low-rank components, all of which share only a single initialized trainable low-rank matrix. This approach cuts the number of trainable parameters while keeping the model's capacity intact. We evaluate ID-LoRA on five diverse benchmarks: Mathematical Reasoning, Code Generation, MMLU, CommonsenseQA, and Safety Alignment. ID-LoRA outperforms both full fine-tuning and existing PEFT baselines (e.g., LoRA, DoRA, HydraLoRA) while using up to 46% fewer trainable parameters than the standard LoRA. In multi-task scenarios, it surpasses LoRA and its recent variants (e.g., DoRA and HydraLoRA) on both Code and MMLU tasks, yet requires only 54% of the trainable parameters demanded by the conventional LoRA.
Paper Structure (26 sections, 4 theorems, 32 equations, 4 figures, 8 tables)

This paper contains 26 sections, 4 theorems, 32 equations, 4 figures, 8 tables.

Key Result

Theorem 1

Under Assumptions 1 and 2, the clustering-aware decomposition achieves a tighter reconstruction error bound compared to global low rank decomposition: where $\Delta = \sum_{l=1}^k \sum_{i \in \mathcal{C}_l} \|B(A_{l(i)} - A^{\text{global}}) \|_F^2 \geq 0$. The inequality becomes strict ($\Delta > 0$) when tasks exhibit clustering structure.

Figures (4)

  • Figure 1: Architectural Comparison and Parameter Efficiency: LoRA versus ID-LoRA. (a) LoRA requires training two low-rank matrices: randomly initialized $A \in \mathbb{R}^{r\times d}$ and zero-initialized $B \in \mathbb{R}^{d\times r}$. (b) ID-LoRA employs the parameter clustering and rank boosting to generate multiple low-rank components while sharing a single B, thereby reducing trainable parameters. (c) Trainable parameters: ID-LoRA achieves $\sim 5\times$ compression versus LoRA at rank $32$ (right) and maintains superior scalability across model sizes (left).
  • Figure 2: A diagram of the ID-LoRA architecture.
  • Figure 3: The inference time and extra memory overhead of different adaptation methods under the same hyperparameter settings as the multi-task experiments on LLaMA-3-8B, tested on an A800 GPU.
  • Figure 4: Performance comparison between ID-LoRA and vanilla LoRA on four representative benchmarks (GSM8K, HumanEval, MMLU, and CommonsenseQA) using LLaMA3-8B as the backbone. Both methods vary the rank while keeping the number of trainable parameters approximately equal, and results are reported under few-shot or zero-shot settings. Detailed results are provided in Appendix \ref{['sec: Experimental Details of Parameter-Parity Performance']}

Theorems & Definitions (10)

  • Definition 1: Pivot Sensitivity
  • Theorem 1: Clustering Reconstruction
  • Theorem 2: Cluster-Pivot Stability
  • Definition 2: Pivot Sensitivity
  • Definition 3: Task Parameter Distance
  • Definition 4: Cluster Low-Rank Decomposition
  • Definition 5: Multi-Task Reconstruction
  • Definition 6: CUR Decomposition
  • Theorem 3: Clustering Reconstruction
  • Theorem 4: Cluster-Pivot Stability