HydraLoRA: An Asymmetric LoRA Architecture for Efficient Fine-Tuning

Chunlin Tian; Zhan Shi; Zhijiang Guo; Li Li; Chengzhong Xu

HydraLoRA: An Asymmetric LoRA Architecture for Efficient Fine-Tuning

Chunlin Tian, Zhan Shi, Zhijiang Guo, Li Li, Chengzhong Xu

TL;DR

This work tackles the inefficiency of conventional LoRA in heterogeneous, multi-task domains by revealing that a single LoRA head causes cross-task interference. It introduces HydraLoRA, an asymmetric LoRA architecture with a shared matrix $A$ and multiple task-specific matrices $B_i$, guided by a Mixture-of-Experts router to automatically allocate inputs to appropriate adapters. Empirical results across single-domain and multi-task benchmarks show HydraLoRA consistently surpasses standard PEFT methods and even LoRA with task-specific splits, while reducing parameter overhead through shared learning and modular specialization. The approach enables domain-robust fine-tuning and efficient inference, offering a practical path to high-performance, low-parameter LLM adaptation in complex real-world tasks.

Abstract

Adapting Large Language Models (LLMs) to new tasks through fine-tuning has been made more efficient by the introduction of Parameter-Efficient Fine-Tuning (PEFT) techniques, such as LoRA. However, these methods often underperform compared to full fine-tuning, particularly in scenarios involving complex datasets. This issue becomes even more pronounced in complex domains, highlighting the need for improved PEFT approaches that can achieve better performance. Through a series of experiments, we have uncovered two critical insights that shed light on the training and parameter inefficiency of LoRA. Building on these insights, we have developed HydraLoRA, a LoRA framework with an asymmetric structure that eliminates the need for domain expertise. Our experiments demonstrate that HydraLoRA outperforms other PEFT approaches, even those that rely on domain knowledge during the training and inference phases.

HydraLoRA: An Asymmetric LoRA Architecture for Efficient Fine-Tuning

TL;DR

and multiple task-specific matrices

, guided by a Mixture-of-Experts router to automatically allocate inputs to appropriate adapters. Empirical results across single-domain and multi-task benchmarks show HydraLoRA consistently surpasses standard PEFT methods and even LoRA with task-specific splits, while reducing parameter overhead through shared learning and modular specialization. The approach enables domain-robust fine-tuning and efficient inference, offering a practical path to high-performance, low-parameter LLM adaptation in complex real-world tasks.

Abstract

Paper Structure (42 sections, 5 equations, 10 figures, 4 tables)

This paper contains 42 sections, 5 equations, 10 figures, 4 tables.

Introduction
Background and Motivation
LoRA Basics
LoRA's Practical Dilemma
Observations
HydraLoRA
Asymmetric LoRA architecture
Workflow of HydraLoRA
Fine-tuning
Inference
Experiments
Experiment Setting
Dataset and Benchmarks
Baselines
Overall Performance
...and 27 more sections

Figures (10)

Figure 1: Illustration of LoRA architecture changes in HydraLoRA. Only the tunable parameters are shown in this Figure. (a) LoRA architecture with matrix A to achieve low rank and matrix B to recover. (b) under the same parameter count, a monolithic LoRA is split into multiple smaller A and B matrices to avoid training interference. (c) based on (b), HydraLoRA has an asymmetric structure that has a shared A matrix and multiple B matrices.
Figure 2: Performance impact of corpus heterogeneity on full fine-tuning vs. parameter-efficient fine-tuning. Heterogeneity signifies the diversity within the dataset, often leading to interference due to its varied content and style S-LoRA-FL. Parameter-efficient approaches are particularly sensitive, suffering greater performance losses in heterogeneous cases.
Figure 3: Breakdown analysis of LoRA modules. Compare fine-tuned LoRA modules of Dolly-15K DollyV2 with three subtasks of Dolly-15K including "summarization (Sum)", "closed QA (QA)" and "information extraction (IE)" using t-SNE. Consider LLaMA2-7B (random seed=42), which contains 32 decoder layers, corresponding to 32 adaptive modules. Each module consists of {0: q_proj of A, 1: q_proj of B, 2: v_proj of A, 3: v_proj of B} submodules. This makes a total of $32\times4$ submodules. Left displays all submodules. Center shows all even submodules, i.e. the A matrix. Right represents all odd submodules, i.e. the B matrix. It can be seen that the differences in the fine-tuned LoRA modules for different tasks arise mainly from the B matrix.
Figure 4: Architecture and workflow of HydraLoRA. During the fine-tuning stage, HydraLoRA first adaptively identifies and initializes $k$ of intrinsic components without specific domain knowledge. It then employs a trainable MoE router that treats each intrinsic component as an expert to automatically segregate training samples into intrinsic components for fine-tuning. During the inference stage, HydraLoRA merges multiple $B$ matrices flexibly and dynamically through a trained router.
Figure 5: Energy consumption and latency during fine-tuning with different LoRA approaches (fine-tuning LLaMA2-7B with GSM-8K).
...and 5 more figures

HydraLoRA: An Asymmetric LoRA Architecture for Efficient Fine-Tuning

TL;DR

Abstract

HydraLoRA: An Asymmetric LoRA Architecture for Efficient Fine-Tuning

Authors

TL;DR

Abstract

Table of Contents

Figures (10)