Table of Contents
Fetching ...

Does Combining Parameter-efficient Modules Improve Few-shot Transfer Accuracy?

Nader Asadi, Mahdi Beitollahi, Yasser Khalil, Yinchuan Li, Guojun Zhang, Xi Chen

TL;DR

This work investigates the composability of parameter-efficient LoRA modules for few-shot transfer to unseen tasks. It analyzes two simple strategies—uniform averaging and learned interpolation—across vision and language models, showing that both improve few-shot transfer and that learned composition maintains competitiveness with full fine-tuning while using far fewer trainable parameters in many settings. The authors demonstrate robustness across label, task, and covariate shifts, and reveal that learned composition can selectively weight upstream modules aligned with downstream task similarity, as evidenced by CKA analyses. Overall, the findings highlight the potential of modular, add-on adapters to enhance transferability without full re-tuning, with implications for scalable and reusable foundation-model adaptation.

Abstract

Parameter-efficient fine-tuning stands as the standard for efficiently fine-tuning large language and vision models on downstream tasks. Specifically, the efficiency of low-rank adaptation has facilitated the creation and sharing of hundreds of custom LoRA modules, each trained on distinct data from various downstream tasks. In this paper, we explore the composability of LoRA modules, examining if combining these pre-trained modules enhances generalization to unseen downstream tasks. Our investigation involves evaluating two approaches: (a) uniform composition, involving averaging upstream LoRA modules with equal weights, and (b) learned composition, where we learn the weights for each upstream module and perform weighted averaging. Our experimental results on both vision and language models reveal that in few-shot settings, where only a limited number of samples are available for the downstream task, both uniform and learned composition methods result in better transfer accuracy; outperforming full fine-tuning and training a LoRA from scratch. Moreover, in full-shot settings, learned composition performs comparably to regular LoRA training with significantly fewer number of trainable parameters. Our research unveils the potential of uniform composition for enhancing transferability in low-shot settings, without introducing additional learnable parameters.

Does Combining Parameter-efficient Modules Improve Few-shot Transfer Accuracy?

TL;DR

This work investigates the composability of parameter-efficient LoRA modules for few-shot transfer to unseen tasks. It analyzes two simple strategies—uniform averaging and learned interpolation—across vision and language models, showing that both improve few-shot transfer and that learned composition maintains competitiveness with full fine-tuning while using far fewer trainable parameters in many settings. The authors demonstrate robustness across label, task, and covariate shifts, and reveal that learned composition can selectively weight upstream modules aligned with downstream task similarity, as evidenced by CKA analyses. Overall, the findings highlight the potential of modular, add-on adapters to enhance transferability without full re-tuning, with implications for scalable and reusable foundation-model adaptation.

Abstract

Parameter-efficient fine-tuning stands as the standard for efficiently fine-tuning large language and vision models on downstream tasks. Specifically, the efficiency of low-rank adaptation has facilitated the creation and sharing of hundreds of custom LoRA modules, each trained on distinct data from various downstream tasks. In this paper, we explore the composability of LoRA modules, examining if combining these pre-trained modules enhances generalization to unseen downstream tasks. Our investigation involves evaluating two approaches: (a) uniform composition, involving averaging upstream LoRA modules with equal weights, and (b) learned composition, where we learn the weights for each upstream module and perform weighted averaging. Our experimental results on both vision and language models reveal that in few-shot settings, where only a limited number of samples are available for the downstream task, both uniform and learned composition methods result in better transfer accuracy; outperforming full fine-tuning and training a LoRA from scratch. Moreover, in full-shot settings, learned composition performs comparably to regular LoRA training with significantly fewer number of trainable parameters. Our research unveils the potential of uniform composition for enhancing transferability in low-shot settings, without introducing additional learnable parameters.
Paper Structure (29 sections, 6 equations, 8 figures, 7 tables)

This paper contains 29 sections, 6 equations, 8 figures, 7 tables.

Figures (8)

  • Figure 1: Performance of fine-tuning strategies relative to classifier tuning in the one-shot transfer learning setting. Both learned (blue) and uniform (orange) composition methods mostly outperform regular LoRA (green) and full fine-tuning (red) baselines, suggesting that the linear interpolation of pre-trained LoRA modules helps few-shot transfer to an unseen downstream task. For each dataset, we use each of the rest of the dataset as an upstream task. Refer to \ref{['sec:task_shift']} for experiment details.
  • Figure 2: Method overview. We start with a foundational model that has undergone LoRA fine-tuning on various tasks. During the few-shot adaptation phase, for each layer of the model, we apply a uniform (left) or learned (right) weighted averaging over the pre-trained upstream LoRA weights.
  • Figure 3: Label shift results. Few-shot transfer results with different numbers of adaptation samples. We can observe that uniform and learned composition methods consistently beat other baselines with fewer number of adaptation samples.
  • Figure 4: Task shift results (vision). Few-shot transfer results with different number of adaptation samples. We observe that both uniform and learned composition methods significantly improve the performance with few number of adaptation samples, while maintaining comparable performance against regular LoRA fine-tuning in the full-shot scenario.
  • Figure 5: Effect of scaling number of upstream tasks. The results on Food101 dataset in full-shot ($K$ = all) scenario indicate that, as the number of pre-trained upstream modules increases, the learned composition of these modules significantly enhances the model’s performance.
  • ...and 3 more figures