Table of Contents
Fetching ...

Model merging with SVD to tie the Knots

George Stoica, Pratik Ramesh, Boglarka Ecsedi, Leshem Choshen, Judy Hoffman

TL;DR

KnOTS uses the SVD to jointly transform the weights of different LoRA models into an aligned space, where existing merging methods can be applied, and introduces a new benchmark that explicitly evaluates whether merged models are general models.

Abstract

Recent model merging methods demonstrate that the parameters of fully-finetuned models specializing in distinct tasks can be combined into one model capable of solving all tasks without retraining. Yet, this success does not transfer well when merging LoRA finetuned models. We study this phenomenon and observe that the weights of LoRA finetuned models showcase a lower degree of alignment compared to their fully-finetuned counterparts. We hypothesize that improving this alignment is key to obtaining better LoRA model merges, and propose KnOTS to address this problem. KnOTS uses the SVD to jointly transform the weights of different LoRA models into an aligned space, where existing merging methods can be applied. In addition, we introduce a new benchmark that explicitly evaluates whether merged models are general models. Notably, KnOTS consistently improves LoRA merging by up to 4.3% across several vision and language benchmarks, including our new setting. We release our code at: https://github.com/gstoica27/KnOTS.

Model merging with SVD to tie the Knots

TL;DR

KnOTS uses the SVD to jointly transform the weights of different LoRA models into an aligned space, where existing merging methods can be applied, and introduces a new benchmark that explicitly evaluates whether merged models are general models.

Abstract

Recent model merging methods demonstrate that the parameters of fully-finetuned models specializing in distinct tasks can be combined into one model capable of solving all tasks without retraining. Yet, this success does not transfer well when merging LoRA finetuned models. We study this phenomenon and observe that the weights of LoRA finetuned models showcase a lower degree of alignment compared to their fully-finetuned counterparts. We hypothesize that improving this alignment is key to obtaining better LoRA model merges, and propose KnOTS to address this problem. KnOTS uses the SVD to jointly transform the weights of different LoRA models into an aligned space, where existing merging methods can be applied. In addition, we introduce a new benchmark that explicitly evaluates whether merged models are general models. Notably, KnOTS consistently improves LoRA merging by up to 4.3% across several vision and language benchmarks, including our new setting. We release our code at: https://github.com/gstoica27/KnOTS.

Paper Structure

This paper contains 47 sections, 2 equations, 6 figures, 6 tables.

Figures (6)

  • Figure 1: The KnOTS method for merging "task-updates" from an arbitrary layer-$j$ of different models. Each weight-update is denoted by $\Delta W_j^{(i)}$, where "$i$" is the $i^{th}$ model. KnOTS first concatenates the updates together and applies the SVD, to obtain $U, \Sigma$ and a set of concatenated $V^{(i)}$ matrices that each correspond to a particular task. KnOTS then merges the $V$'s into a single $V^{(merged)}$ matrix. Finally, KnOTS multiplies the $U, \Sigma$ and $V^{(merged)}$ to obtain a merged-update to be added to the pretrained model.
  • Figure 2: Finetuning strategy impacts representation alignments between models trained on different tasks. The figure shows the average pairwise centered kernel alignment (CKA---kornblith2019similarity) between the outputs solely given by finetuning updates (e.g., a $\Delta W_j^{(i)}$) across every attention layer, from models finetuned on different tasks with different strategies (defined in § \ref{['sec:per_task']}). High CKA indicates that the task-updates of different models are aligned. (a) Full-rank finetuned models exhibit high CKA and alignment. (b) LoRA finetuned models are drastically less aligned. (c) However, they are dramatically more aligned with KnOTS.
  • Figure 2: Normalized per-task avg. image classification results. We merge eight ViT-L/14 models finetuned with LoRA on eight vision datasets. We report the average normalized accuracies against average absolute accuracy of the finetuned models: 92.3%. KnOTS is the best.
  • Figure 3: Normalized per-task avg. NLI results. We merge six Llama3-8B models finetuned with LoRA on different NLI datasets. All numbers are normalized against the absolute average per-task accuracy of the individual finetuned models: 92.9%. KnOTS performs best.
  • Figure 3: KnOTS boosts performance with scale. KnOTS-TIES continues to see gains, outperforming original TIES yadav2023resolving and Task Arithmetic (TA) ilharco2023editing when merging an increasing number of tasks in the per-task evaluation vision setting § \ref{['sec:per_task']}. Performance is the average normalized accuracy with 95% confidence intervals over merging different combinations of the tasks
  • ...and 1 more figures