Table of Contents
Fetching ...

CoLA: Collaborative Low-Rank Adaptation

Yiyun Zhou, Chang Yao, Jingyuan Chen

TL;DR

CoLA introduces a flexible LoRA framework that decouples the rigid low-rank factor design of traditional LoRA by allowing asymmetric numbers of $A$ and $B$ matrices and optimizing their collaboration. It extends PiSSA initialization to CoLA, distributing principal components across multiple $A_i$ and $B_j$ and freezing residual directions to better capture essential directions with scarce data. Through three collaborative strategies—Fully Collaborative, Random Collaborative, and Heuristic Collaborative—CoLA demonstrates improved generalization and robustness, particularly in low-sample, multi-domain settings on Llama backbones. The approach achieves strong performance gains over FFT and various PEFT baselines, while offering flexible energy-usage trade-offs and publicly releasing code and data for reproducibility and broader impact.

Abstract

The scaling law of Large Language Models (LLMs) reveals a power-law relationship, showing diminishing return on performance as model scale increases. While training LLMs from scratch is resource-intensive, fine-tuning a pre-trained model for specific tasks has become a practical alternative. Full fine-tuning (FFT) achieves strong performance; however, it is computationally expensive and inefficient. Parameter-efficient fine-tuning (PEFT) methods, like LoRA, have been proposed to address these challenges by freezing the pre-trained model and adding lightweight task-specific modules. LoRA, in particular, has proven effective, but its application to multi-task scenarios is limited by interference between tasks. Recent approaches, such as Mixture-of-Experts (MOE) and asymmetric LoRA, have aimed to mitigate these issues but still struggle with sample scarcity and noise interference due to their fixed structure. In response, we propose CoLA, a more flexible LoRA architecture with an efficient initialization scheme, and introduces three collaborative strategies to enhance performance by better utilizing the quantitative relationships between matrices $A$ and $B$. Our experiments demonstrate the effectiveness and robustness of CoLA, outperforming existing PEFT methods, especially in low-sample scenarios. Our data and code are fully publicly available at https://github.com/zyy-2001/CoLA.

CoLA: Collaborative Low-Rank Adaptation

TL;DR

CoLA introduces a flexible LoRA framework that decouples the rigid low-rank factor design of traditional LoRA by allowing asymmetric numbers of and matrices and optimizing their collaboration. It extends PiSSA initialization to CoLA, distributing principal components across multiple and and freezing residual directions to better capture essential directions with scarce data. Through three collaborative strategies—Fully Collaborative, Random Collaborative, and Heuristic Collaborative—CoLA demonstrates improved generalization and robustness, particularly in low-sample, multi-domain settings on Llama backbones. The approach achieves strong performance gains over FFT and various PEFT baselines, while offering flexible energy-usage trade-offs and publicly releasing code and data for reproducibility and broader impact.

Abstract

The scaling law of Large Language Models (LLMs) reveals a power-law relationship, showing diminishing return on performance as model scale increases. While training LLMs from scratch is resource-intensive, fine-tuning a pre-trained model for specific tasks has become a practical alternative. Full fine-tuning (FFT) achieves strong performance; however, it is computationally expensive and inefficient. Parameter-efficient fine-tuning (PEFT) methods, like LoRA, have been proposed to address these challenges by freezing the pre-trained model and adding lightweight task-specific modules. LoRA, in particular, has proven effective, but its application to multi-task scenarios is limited by interference between tasks. Recent approaches, such as Mixture-of-Experts (MOE) and asymmetric LoRA, have aimed to mitigate these issues but still struggle with sample scarcity and noise interference due to their fixed structure. In response, we propose CoLA, a more flexible LoRA architecture with an efficient initialization scheme, and introduces three collaborative strategies to enhance performance by better utilizing the quantitative relationships between matrices and . Our experiments demonstrate the effectiveness and robustness of CoLA, outperforming existing PEFT methods, especially in low-sample scenarios. Our data and code are fully publicly available at https://github.com/zyy-2001/CoLA.

Paper Structure

This paper contains 21 sections, 1 theorem, 6 equations, 6 figures, 8 tables.

Key Result

Theorem 3.1

If the SVD of $W \in \mathbb{R}^{n \times m}$ is $U S V^{\top}$, then the optimal rank $r$ approximation of $W$ is $U_{[:n, :r]} S_{[: r, :r]} V_{[:m, :r]}^{\top}$.

Figures (6)

  • Figure 1: The comparison between Full Fine-tuning and different LoRA variant structures.
  • Figure 2: Overview of CoLA with three collaborative strategies.
  • Figure 3: The impact of PiSSA initialization on LoRA and CoLA based on Llama-3.1-8B in the generality domain when the sample size is reduced.
  • Figure 4: The performance of Llama-3.1-8B when the number of matrices $A$ and $B$ in CoLA differs across the domains of law, medicine, math, and finance.
  • Figure 5: Energy consumption of three collaborative strategies based on Llama-3.1-8B.
  • ...and 1 more figures

Theorems & Definitions (1)

  • Theorem 3.1