$\textit{Trans-LoRA}$: towards data-free Transferable Parameter Efficient Finetuning

Runqian Wang; Soumya Ghosh; David Cox; Diego Antognini; Aude Oliva; Rogerio Feris; Leonid Karlinsky

$\textit{Trans-LoRA}$: towards data-free Transferable Parameter Efficient Finetuning

Runqian Wang, Soumya Ghosh, David Cox, Diego Antognini, Aude Oliva, Rogerio Feris, Leonid Karlinsky

TL;DR

Trans-LoRA tackles the practical problem of transferring LoRA-based PEFT adapters across base models without access to the original task data. It introduces a synthetic-data curriculum guided by a discriminator, enabling knowledge distillation from a source LoRA on a source model to a target LoRA on a new base model, while preserving data privacy. Across Llama2 and Gemma families and multiple PEFT variants on BBH, MMLU, GSM8K, and MBPP, Trans-LoRA achieves lossless or improved transfer, often outperforming both the source LoRA and the target base model, and shows scalable gains as synthetic data increases. The approach promises practical benefits for large-scale cloud customization by enabling near data-free, robust LoRA transfers during base-model upgrades, with some computational overhead and potential task-understanding limitations as noted in the analysis.

Abstract

Low-rank adapters (LoRA) and their variants are popular parameter-efficient fine-tuning (PEFT) techniques that closely match full model fine-tune performance while requiring only a small number of additional parameters. These additional LoRA parameters are specific to the base model being adapted. When the base model needs to be deprecated and replaced with a new one, all the associated LoRA modules need to be re-trained. Such re-training requires access to the data used to train the LoRA for the original base model. This is especially problematic for commercial cloud applications where the LoRA modules and the base models are hosted by service providers who may not be allowed to host proprietary client task data. To address this challenge, we propose $\textit{Trans-LoRA}$ -- a novel method for lossless, nearly data-free transfer of LoRAs across base models. Our approach relies on synthetic data to transfer LoRA modules. Using large language models, we design a synthetic data generator to approximate the data-generating process of the $\textit{observed}$ task data subset. Training on the resulting synthetic dataset transfers LoRA modules to new models. We show the effectiveness of our approach using both LLama and Gemma model families. Our approach achieves lossless (mostly improved) LoRA transfer between models within and across different base model families, and even between different PEFT methods, on a wide variety of tasks.

$\textit{Trans-LoRA}$: towards data-free Transferable Parameter Efficient Finetuning

TL;DR

Abstract

-- a novel method for lossless, nearly data-free transfer of LoRAs across base models. Our approach relies on synthetic data to transfer LoRA modules. Using large language models, we design a synthetic data generator to approximate the data-generating process of the

task data subset. Training on the resulting synthetic dataset transfers LoRA modules to new models. We show the effectiveness of our approach using both LLama and Gemma model families. Our approach achieves lossless (mostly improved) LoRA transfer between models within and across different base model families, and even between different PEFT methods, on a wide variety of tasks.

Paper Structure (26 sections, 1 equation, 9 figures, 8 tables, 1 algorithm)

This paper contains 26 sections, 1 equation, 9 figures, 8 tables, 1 algorithm.

Introduction
Related Work
Parameter Efficient Finetuning (PEFT)
Knowledge Distillation (KD)
Synthetic Data
Trans-LoRA
Capabilities transfer through knowledge distillation on synthetic data
Experiments
Experimental Setup
Main Results
Ablation Experiments
Distillation Data
Other PEFT Methods
Continuous Transfer
Scaling the amount of Synthetic Samples
...and 11 more sections

Figures (9)

Figure 1: Trans-LoRA overview. Examples from 'boolean expressions' BBH task illustrate the lower diversity of raw synthetic samples compared to the original task data, which is fixed by our filtering approach. The source model is used to: 1. train the source LoRA; 2. synthesize data for discriminator training; and 3. train the (LoRA) discriminator. Then, the target model is used to synthesize data for transfer (filtered by discriminator) and train target LoRA using the source LoRA teacher.
Figure 2: Detailed breakdown of Trans-LoRA. Task Finetuning is done beforehand and produces the source LoRA for the source model and the discriminator. Task Transfer utilizes the source LoRA and discriminator to transfer the LoRA onto the target model and produce the target LoRA.
Figure 3: Trans-LoRA
Figure 4: Transferred LoRA accuracy vs. source LoRA accuracy on MMLU tasks. Details the rows of \ref{['tab:mmlu']}. Bottom left: row 3; Bottom right: row 4.
Figure 5: Scaling the number of synthetic samples generated through Trans-LoRA. Total training iterations in each experiment are kept identical for fair comparison. Done on BBH with Gemma-2b to Gemma-7b transfer and Gemma-2b as discriminator.
...and 4 more figures

$\textit{Trans-LoRA}$: towards data-free Transferable Parameter Efficient Finetuning

TL;DR

Abstract

$\textit{Trans-LoRA}$: towards data-free Transferable Parameter Efficient Finetuning

Authors

TL;DR

Abstract

Table of Contents

Figures (9)