Table of Contents
Fetching ...

MTL-LoRA: Low-Rank Adaptation for Multi-Task Learning

Yaming Yang, Dilxat Muhtar, Yelong Shen, Yuefeng Zhan, Jianfeng Liu, Yujing Wang, Hao Sun, Denvy Deng, Feng Sun, Qi Zhang, Weizhu Chen, Yunhai Tong

TL;DR

MTL-LoRA addresses the interference problem observed when applying low-rank adapters to multi-task fine-tuning by introducing per-task transformations and multiple information-sharing pathways in a low-dimensional space. The method preserves task-specific signals via a task-specific diagonal transform and enables adaptive cross-task information exchange through a set of up-projection matrices with softmax-based weighting. Empirical results across GLUE, commonsense benchmarks, image-text tasks, and a large-scale Ads dataset show that MTL-LoRA consistently outperforms LoRA and several variants, often with substantially fewer trainable parameters. The work also analyzes task differentiation, robustness, and inference overhead, confirming practical benefits for multi-task learning with parameter-efficient fine-tuning in real-world settings.

Abstract

Parameter-efficient fine-tuning (PEFT) has been widely employed for domain adaptation, with LoRA being one of the most prominent methods due to its simplicity and effectiveness. However, in multi-task learning (MTL) scenarios, LoRA tends to obscure the distinction between tasks by projecting sparse high-dimensional features from different tasks into the same dense low-dimensional intrinsic space. This leads to task interference and suboptimal performance for LoRA and its variants. To tackle this challenge, we propose MTL-LoRA, which retains the advantages of low-rank adaptation while significantly enhancing MTL capabilities. MTL-LoRA augments LoRA by incorporating additional task-adaptive parameters that differentiate task-specific information and capture shared knowledge across various tasks within low-dimensional spaces. This approach enables pre-trained models to jointly adapt to different target domains with a limited number of trainable parameters. Comprehensive experimental results, including evaluations on public academic benchmarks for natural language understanding, commonsense reasoning, and image-text understanding, as well as real-world industrial text Ads relevance datasets, demonstrate that MTL-LoRA outperforms LoRA and its various variants with comparable or even fewer learnable parameters in MTL setting.

MTL-LoRA: Low-Rank Adaptation for Multi-Task Learning

TL;DR

MTL-LoRA addresses the interference problem observed when applying low-rank adapters to multi-task fine-tuning by introducing per-task transformations and multiple information-sharing pathways in a low-dimensional space. The method preserves task-specific signals via a task-specific diagonal transform and enables adaptive cross-task information exchange through a set of up-projection matrices with softmax-based weighting. Empirical results across GLUE, commonsense benchmarks, image-text tasks, and a large-scale Ads dataset show that MTL-LoRA consistently outperforms LoRA and several variants, often with substantially fewer trainable parameters. The work also analyzes task differentiation, robustness, and inference overhead, confirming practical benefits for multi-task learning with parameter-efficient fine-tuning in real-world settings.

Abstract

Parameter-efficient fine-tuning (PEFT) has been widely employed for domain adaptation, with LoRA being one of the most prominent methods due to its simplicity and effectiveness. However, in multi-task learning (MTL) scenarios, LoRA tends to obscure the distinction between tasks by projecting sparse high-dimensional features from different tasks into the same dense low-dimensional intrinsic space. This leads to task interference and suboptimal performance for LoRA and its variants. To tackle this challenge, we propose MTL-LoRA, which retains the advantages of low-rank adaptation while significantly enhancing MTL capabilities. MTL-LoRA augments LoRA by incorporating additional task-adaptive parameters that differentiate task-specific information and capture shared knowledge across various tasks within low-dimensional spaces. This approach enables pre-trained models to jointly adapt to different target domains with a limited number of trainable parameters. Comprehensive experimental results, including evaluations on public academic benchmarks for natural language understanding, commonsense reasoning, and image-text understanding, as well as real-world industrial text Ads relevance datasets, demonstrate that MTL-LoRA outperforms LoRA and its various variants with comparable or even fewer learnable parameters in MTL setting.

Paper Structure

This paper contains 32 sections, 4 equations, 4 figures, 11 tables.

Figures (4)

  • Figure 1: t-SNE visualization of task-specific features extracted from the $\mathbf{O}$ linear layer of the final block in the LLaMA2-7B model, comparing LoRA and MTL-LoRA after fine-tuning on a commonsense reasoning dataset.
  • Figure 2: The overall architecture of MTL-LoRA. MTL-LoRA employs task-specific transformation matrices and multiple up-projection matrices to learn both task-specific and shared information.
  • Figure 3: The performance of MTL-LoRA on GLUE benchmark with different hyperparameter configurations.
  • Figure 4: Inference latency of different low-rank adapters on LLaMA2-7B with varying batch sizes. The results are averaged over 100 runs on a single A100-80G GPU. DoRA and MultiLoRA exhibit similar performance to LoRA as they can also be merged into pre-trained weights.