Collaborative and Efficient Fine-tuning: Leveraging Task Similarity

Gagik Magakyan; Amirhossein Reisizadeh; Chanwoo Park; Pablo A. Parrilo; Asuman Ozdaglar

Collaborative and Efficient Fine-tuning: Leveraging Task Similarity

Gagik Magakyan, Amirhossein Reisizadeh, Chanwoo Park, Pablo A. Parrilo, Asuman Ozdaglar

TL;DR

This work addresses data scarcity in fine-tuning foundation models across multiple downstream tasks by proposing CoLoRA, a collaborative and parameter-efficient extension of LoRA. It learns a shared global adapter capturing task similarity while maintaining personalized per-task scalars, achieving parameter complexity of $O(dr + kr^2)$. The authors provide a theoretical analysis via a heterogeneous linear regression model and GRIP-based guarantees for an alternating minimization scheme, showing convergence under sufficient task similarity. Empirically, CoLoRA yields substantial gains over federated baselines, especially when tasks are related, highlighting its practical impact for scalable, distributed model personalization.

Abstract

Adaptability has been regarded as a central feature in the foundation models, enabling them to effectively acclimate to unseen downstream tasks. Parameter-efficient fine-tuning methods such as celebrated LoRA facilitate efficient adaptation of large foundation models using labeled, high-quality and generally scarce task data. To mitigate data scarcity in fine-tuning of foundation models, we propose to leverage task similarity across multiple downstream users. Intuitively, users with similar tasks must be able to assist each other in boosting the effective fine-tuning data size. We propose Collaborative Low-Rank Adaptation, or CoLoRA, which exploits task similarity to collaboratively and efficiently fine-tune personalized foundation models. The main idea in CoLoRA is to train one shared adapter capturing underlying task similarities across all tasks, and personalized adapters tailored to user-specific tasks. We theoretically study CoLoRA on heterogeneous linear regression and provide provable guarantees for ground truth recovery. We also conduct several natural language experiments with varying task similarity, which further demonstrate that when trained together with similar tasks, individual performances are significantly boosted.

Collaborative and Efficient Fine-tuning: Leveraging Task Similarity

TL;DR

. The authors provide a theoretical analysis via a heterogeneous linear regression model and GRIP-based guarantees for an alternating minimization scheme, showing convergence under sufficient task similarity. Empirically, CoLoRA yields substantial gains over federated baselines, especially when tasks are related, highlighting its practical impact for scalable, distributed model personalization.

Abstract

Paper Structure (60 sections, 25 theorems, 243 equations, 5 figures, 5 tables, 2 algorithms)

This paper contains 60 sections, 25 theorems, 243 equations, 5 figures, 5 tables, 2 algorithms.

Introduction
Main contributions.
Preliminaries
LoRA
Adapting to multiple tasks
Task similarity
Collaborative Low-Rank Adaptation
Related work.
Federated LoRA.
Linear Representation Learning.
Theoretical Understanding of CoLoRA with Linear Regression
Our approach: Collaborative AltMin
Theoretical results
Generalized Restricted Isometry Property
CoLoRA in Experiments
...and 45 more sections

Key Result

Theorem 1

Assume that large and small batch sizes are where $\varrho = \max(\kappa, \gamma)$ and $\widetilde{\Theta}(\cdot)$ hides logarithmic factors. Moreover, suppose that task similarity $\xi$ is large enough s.t. Then for any $\varepsilon > 0$ and with high probability, CoAltMin recovers $\bm{U}^*,\bm{V}^*$ after $T = \Theta(\log(1/\varepsilon))$ iterations with

Figures (5)

Figure 1: Column subspace similarity averaged across all layers.
Figure 2: CoLoRA for Task $T$: Given a list of integers, remove all the even elements. Each point corresponds to task $T$'s Rouge-L score jointly trained with three other particular tasks. The black dashed line indicates the score of task $T$ when trained exclusively.
Figure 3: Performance of CoLoRA for a fixed task w.r.t different levels of similarity. The dashed line is the performance when only using local data.
Figure 4: Performance difference between CoLoRA and baseline methods. For each experiment, we compute the average performance of all clients on their respective tasks and plot the difference between CoLoRA and each baseline. A positive score difference indicates superior performance of CoLoRA over the baseline.
Figure 5: Similarity across network layers. Smaller matrix indices correspond to parameters from earlier layers in the network. Red points denote the similarity between tasks $T_1$ and $T_2$, while blue points correspond to $T_1$ and $T_3$.

Theorems & Definitions (32)

Definition 1: Column subspace similarity
Definition 2: Task similarity
Theorem 1
Corollary 1
Definition 3: RIP candes2005decodingrecht2010guaranteed
Definition 4: GRIP
Proposition 1
Definition 5
Theorem 2
Lemma 1
...and 22 more

Collaborative and Efficient Fine-tuning: Leveraging Task Similarity

TL;DR

Abstract

Collaborative and Efficient Fine-tuning: Leveraging Task Similarity

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (5)

Theorems & Definitions (32)