Table of Contents
Fetching ...

CopRA: A Progressive LoRA Training Strategy

Zhan Zhuang, Xiequn Wang, Yulong Zhang, Wei Li, Yu Zhang, Ying Wei

TL;DR

This work proposes a novel progressive training strategy for LoRA with random layer dropping that exhibits linear mode connectivity, which enables efficient model merging and paves the way for federated learning and multi-task learning via LoRA merging.

Abstract

Low-Rank Adaptation (LoRA) is a parameter-efficient technique for rapidly fine-tuning foundation models. In standard LoRA training dynamics, models tend to quickly converge to a local optimum near the initialization. However, this local optimum may not be ideal for out-of-distribution data or tasks such as merging and pruning. In this work, we propose a novel progressive training strategy for LoRA with random layer dropping. This strategy also optimizes the Shapley value of LoRA parameters in each layer, treating each layer as a player in a cooperative game. We refer to this method as Cooperative LoRA (CopRA). Our experimental results demonstrate that parameters trained with CopRA exhibit linear mode connectivity, which enables efficient model merging. This also paves the way for federated learning and multi-task learning via LoRA merging. Additionally, by optimizing the Shapley value, CopRA shows superior performance in pruning tasks.

CopRA: A Progressive LoRA Training Strategy

TL;DR

This work proposes a novel progressive training strategy for LoRA with random layer dropping that exhibits linear mode connectivity, which enables efficient model merging and paves the way for federated learning and multi-task learning via LoRA merging.

Abstract

Low-Rank Adaptation (LoRA) is a parameter-efficient technique for rapidly fine-tuning foundation models. In standard LoRA training dynamics, models tend to quickly converge to a local optimum near the initialization. However, this local optimum may not be ideal for out-of-distribution data or tasks such as merging and pruning. In this work, we propose a novel progressive training strategy for LoRA with random layer dropping. This strategy also optimizes the Shapley value of LoRA parameters in each layer, treating each layer as a player in a cooperative game. We refer to this method as Cooperative LoRA (CopRA). Our experimental results demonstrate that parameters trained with CopRA exhibit linear mode connectivity, which enables efficient model merging. This also paves the way for federated learning and multi-task learning via LoRA merging. Additionally, by optimizing the Shapley value, CopRA shows superior performance in pruning tasks.

Paper Structure

This paper contains 11 sections, 5 equations, 6 figures, 1 table.

Figures (6)

  • Figure 1: Illustration of (left) LoRA merging and (right) Training strategy of CopRA. Different colors represent parameters from different seeds or tasks, while grey indicates inactive parameters.
  • Figure 2: Approximated Shapley value of each layer.
  • Figure 3: Visualization of accuracy landscape across different methods for the CLIP datasets. The X-axis represents the interpolation coefficient, while the Y-axis indicates accuracy (%).
  • Figure 4: (left) Structured pruning to various components, including layers (every other, low, middle, high) and attention elements. (right) Unstructured pruning with varying levels of sparsity.
  • Figure 5: (left) T-SNE visualization with different seeds and learning rates. (middle) Merging accuracy across different learning rates. (right) Merging results on theMTL15dataset.
  • ...and 1 more figures