Scalable Multi-Task Low-Rank Model Adaptation

Zichen Tian; Antoine Ledent; Qianru Sun

Scalable Multi-Task Low-Rank Model Adaptation

Zichen Tian, Antoine Ledent, Qianru Sun

Abstract

Scaling multi-task low-rank adaptation (LoRA) to a large number of tasks induces catastrophic performance degradation, such as an accuracy drop from 88.2% to 2.0% on DOTA when scaling from 5 to 15 tasks. This failure is due to parameter and representation misalignment. We find that existing solutions, like regularization and dynamic routing, fail at scale because they are constrained by a fundamental trade-off: strengthening regularization to reduce inter-task conflict inadvertently suppresses the essential feature discrimination required for effective routing. In this work, we identify two root causes for this trade-off. First, uniform regularization disrupts inter-task knowledge sharing: shared underlying knowledge concentrates in high-SV components (89% alignment on Flanv2->BBH). Uniform regularization forces high-SV components to update in orthogonal directions, directly disrupting the shared knowledge. Second, Conflict Amplification: Applying LoRA at the component-level (e.g., W_q, W_v) amplifies gradient conflicts; we show block-level adaptation reduces this conflict by 76% with only 50% parameters. Based on these insights, we propose mtLoRA, a scalable solution with three novel designs: 1) Spectral-Aware Regularization to selectively orthogonalize low-SV components while preserving high-SV shared knowledge, 2) Block-Level Adaptation to mitigate conflict amplification and largely improve parameter efficiency, and 3) Fine-Grained Routing using dimension-specific weights for superior expressive power. On four large-scale (15-25 tasks) vision (DOTA and iNat2018) and NLP (Dolly-15k and BBH) benchmarks, mtLoRA achieves 91.7%, 81.5%, 44.5% and 38.5% accuracy on DOTA, iNat2018, Dolly-15k and BBH respectively, outperforming the state-of-the-art by 2.3% on average while using 47% fewer parameters and 24% less training time.

Scalable Multi-Task Low-Rank Model Adaptation

Abstract

Paper Structure (17 sections, 4 equations, 2 figures, 6 tables)

This paper contains 17 sections, 4 equations, 2 figures, 6 tables.

Introduction
Related Works
Multi-Task LoRA Adaptation.
Multi-Task LoRA Placement Strategies.
Method
Task Formulation
Spectral-Aware Regularization
Fine-Grained Routing
Block-Level Adaptation
Why does block-level adaptation work?
Experiments
Experimental Setup
Challenge of Multi-Task Collapse
Ablation Studies of Our Method
Comparison with State-of-the-Art
...and 2 more sections

Figures (2)

Figure 1: Motivating observations for our three novel designs.(A) Orthogonal regularization introduces a trade-off between conflict reduction and routing uncertainty. Specifically, through orthogonal regularization, the model accuracy (blue curve) peaks at $\lambda=0.25$ (+1.7%) but degrades at $\lambda=1.0$ (-1.8%), due to increased routing uncertainty (i.e., Routing Entropy in orange curve). (B) Shared knowledge concentrates in high-SV components. Specifically, high-SV (top-20%, highlighted) shows 89% inter-task alignment and encodes 54% of total singular values, while low-SV (50-100%) shows only 3% alignment with 22% of singular values (detailed in Sec. \ref{['sec:exp_setup']}). This motivates spectral-aware regularization: preserve high-SV shared knowledge, only orthogonalize low-SV components. (C) Block-level LoRA adaptation reduces gradient conflicts. Specifically, block-level adaptation achieves higher gradient alignment between tasks (measured by cosine similarity, $-0.013$$\pm 0.169$) as compared to component-level adaptation ($-0.054$$\pm 0.201$), accompanied by a +2.1% accuracy improvement (91.2% vs. 89.0% in Table \ref{['tab:attaching_level']}). See Sec. \ref{['sec:exp_ablation']} for detailed experimental setups.
Figure 2: The architectural innovations of mtLoRA.(A) Block-Level Adaptation. The LoRA update is computed in a parallel path that bypasses the block's internal non-linearities, mitigating gradient conflict amplification. This path takes the same LayerNorm output as the main block. (B) Fine-Grained Routing. Within the parallel path, a router (lightweight MLP) generates dimension-specific weight vectors to compose task experts, allowing different feature subspaces to use different LoRA combinations.

Scalable Multi-Task Low-Rank Model Adaptation

Abstract

Scalable Multi-Task Low-Rank Model Adaptation

Authors

Abstract

Table of Contents

Figures (2)