Table of Contents
Fetching ...

LoRA on the Go: Instance-level Dynamic LoRA Selection and Merging

Seungeon Lee, Soumi Das, Manish Gupta, Krishna P. Gummadi

TL;DR

LoGo tackles the challenge of deploying multiple LoRA adapters across heterogeneous input domains without task-specific training. It dynamically selects and merges adapters at inference by deriving relevance scores $s_i$ from projection outputs $oldsymbol{o}_{i,T}$ through norms or inverse-entropy, and then computes a merged output via $oldsymbol{o}_{ ext{merge}} = \sum_{i\in \mathcal{S}} \tilde{w}_i \boldsymbol{o}_{i,T}$ with weights $\tilde{w}_i = s_i / \sum_{j\in \mathcal{S}} s_j$. This training-free, instance-level approach uses an output-based mixture merging strategy to avoid costly parameter recomputation, enabling real-time adaptation across evolving LoRA pools. Empirical results across 5 benchmarks, 27 datasets, and 3 model families show LoGo achieving competitive or better performance than training-based baselines on several tasks, with gains up to 3.6% on Struct-to-Text and NLI, while maintaining comparable inference throughput. The work demonstrates the practical value of training-free, instance-level adaptation for deploying large language models in diverse, dynamic environments.

Abstract

Low-Rank Adaptation (LoRA) has emerged as a parameter-efficient approach for fine-tuning large language models. However, conventional LoRA adapters are typically trained for a single task, limiting their applicability in real-world settings where inputs may span diverse and unpredictable domains. At inference time, existing approaches combine multiple LoRAs for improving performance on diverse tasks, while usually requiring labeled data or additional task-specific training, which is expensive at scale. In this work, we introduce LoRA on the Go (LoGo), a training-free framework that dynamically selects and merges adapters at the instance level without any additional requirements. LoGo leverages signals extracted from a single forward pass through LoRA adapters, to identify the most relevant adapters and determine their contributions on-the-fly. Across 5 NLP benchmarks, 27 datasets, and 3 model families, LoGo outperforms training-based baselines on some tasks upto a margin of 3.6% while remaining competitive on other tasks and maintaining inference throughput, highlighting its effectiveness and practicality.

LoRA on the Go: Instance-level Dynamic LoRA Selection and Merging

TL;DR

LoGo tackles the challenge of deploying multiple LoRA adapters across heterogeneous input domains without task-specific training. It dynamically selects and merges adapters at inference by deriving relevance scores from projection outputs through norms or inverse-entropy, and then computes a merged output via with weights . This training-free, instance-level approach uses an output-based mixture merging strategy to avoid costly parameter recomputation, enabling real-time adaptation across evolving LoRA pools. Empirical results across 5 benchmarks, 27 datasets, and 3 model families show LoGo achieving competitive or better performance than training-based baselines on several tasks, with gains up to 3.6% on Struct-to-Text and NLI, while maintaining comparable inference throughput. The work demonstrates the practical value of training-free, instance-level adaptation for deploying large language models in diverse, dynamic environments.

Abstract

Low-Rank Adaptation (LoRA) has emerged as a parameter-efficient approach for fine-tuning large language models. However, conventional LoRA adapters are typically trained for a single task, limiting their applicability in real-world settings where inputs may span diverse and unpredictable domains. At inference time, existing approaches combine multiple LoRAs for improving performance on diverse tasks, while usually requiring labeled data or additional task-specific training, which is expensive at scale. In this work, we introduce LoRA on the Go (LoGo), a training-free framework that dynamically selects and merges adapters at the instance level without any additional requirements. LoGo leverages signals extracted from a single forward pass through LoRA adapters, to identify the most relevant adapters and determine their contributions on-the-fly. Across 5 NLP benchmarks, 27 datasets, and 3 model families, LoGo outperforms training-based baselines on some tasks upto a margin of 3.6% while remaining competitive on other tasks and maintaining inference throughput, highlighting its effectiveness and practicality.

Paper Structure

This paper contains 35 sections, 6 equations, 15 figures, 5 tables, 1 algorithm.

Figures (15)

  • Figure 1: Overall workflow of the proposed LoRA on the Go (LoGo) framework.
  • Figure 2: Heatmap illustrating signal patterns across LoRA adapters trained on top of the Qwen-2.5-7B backbone. The x-axis represents LoRAs trained on different tasks, while the y-axis corresponds to datasets from those tasks. Each cell shows the $\ell_2$ norm of the projection outputs. The norm values are min-max normalized to [0,1] across datasets for each LoRA. Related task clusters are highlighted in red boxes. More results on signal intensity are in Appendix \ref{['sec:full_signals']}.
  • Figure 3: Alignment between merging weights and task similarity of LoGo with (a) norm and (b) entropy as signals for Big-Bench Hard task and Qwen2.5-32B model.
  • Figure 4: Comparison of (a) LoRA selection count by LoGo with Llama-3.1-8B model and (b) task similarity for BBH Word Sorting dataset. Each color in the bar present the priority of the LoRA when it was selected.
  • Figure 5: Performance of LoGo using (a) norm and (b) entropy across datasets with different merging methods -- mixture and fusion.
  • ...and 10 more figures