Table of Contents
Fetching ...

MTLoRA: A Low-Rank Adaptation Approach for Efficient Multi-Task Learning

Ahmed Agiza, Marina Neseem, Sherief Reda

TL;DR

MTLoRA tackles the inefficiency of fine-tuning large pre-trained models for multiple tasks by introducing Task-Agnostic LoRA (TA-LoRA) and Task-Specific LoRA (TS-LoRA) to a shared hierarchical vision-transformer backbone. It enables cross-task knowledge sharing while preserving task-specific updates through a multi-scale encoder–decoder architecture and per-task feature fusion. On the PASCAL MTL benchmark, MTLoRA achieves higher average task performance than full fine-tuning while reducing trainable parameters by $3.6\times$, and it attains Pareto-optimal efficiency compared with state-of-the-art PEFT methods; MTLoRA+ offers further parameter reductions by extending low-rank modules to patch merging. The approach generalizes across backbones and decoders and scales with more tasks, providing a practical, scalable solution for efficient multi-task learning in dense-vision settings; code is publicly available.

Abstract

Adapting models pre-trained on large-scale datasets to a variety of downstream tasks is a common strategy in deep learning. Consequently, parameter-efficient fine-tuning methods have emerged as a promising way to adapt pre-trained models to different tasks while training only a minimal number of parameters. While most of these methods are designed for single-task adaptation, parameter-efficient training in Multi-Task Learning (MTL) architectures is still unexplored. In this paper, we introduce MTLoRA, a novel framework for parameter-efficient training of MTL models. MTLoRA employs Task-Agnostic and Task-Specific Low-Rank Adaptation modules, which effectively disentangle the parameter space in MTL fine-tuning, thereby enabling the model to adeptly handle both task specialization and interaction within MTL contexts. We applied MTLoRA to hierarchical-transformer-based MTL architectures, adapting them to multiple downstream dense prediction tasks. Our extensive experiments on the PASCAL dataset show that MTLoRA achieves higher accuracy on downstream tasks compared to fully fine-tuning the MTL model while reducing the number of trainable parameters by 3.6x. Furthermore, MTLoRA establishes a Pareto-optimal trade-off between the number of trainable parameters and the accuracy of the downstream tasks, outperforming current state-of-the-art parameter-efficient training methods in both accuracy and efficiency. Our code is publicly available.

MTLoRA: A Low-Rank Adaptation Approach for Efficient Multi-Task Learning

TL;DR

MTLoRA tackles the inefficiency of fine-tuning large pre-trained models for multiple tasks by introducing Task-Agnostic LoRA (TA-LoRA) and Task-Specific LoRA (TS-LoRA) to a shared hierarchical vision-transformer backbone. It enables cross-task knowledge sharing while preserving task-specific updates through a multi-scale encoder–decoder architecture and per-task feature fusion. On the PASCAL MTL benchmark, MTLoRA achieves higher average task performance than full fine-tuning while reducing trainable parameters by , and it attains Pareto-optimal efficiency compared with state-of-the-art PEFT methods; MTLoRA+ offers further parameter reductions by extending low-rank modules to patch merging. The approach generalizes across backbones and decoders and scales with more tasks, providing a practical, scalable solution for efficient multi-task learning in dense-vision settings; code is publicly available.

Abstract

Adapting models pre-trained on large-scale datasets to a variety of downstream tasks is a common strategy in deep learning. Consequently, parameter-efficient fine-tuning methods have emerged as a promising way to adapt pre-trained models to different tasks while training only a minimal number of parameters. While most of these methods are designed for single-task adaptation, parameter-efficient training in Multi-Task Learning (MTL) architectures is still unexplored. In this paper, we introduce MTLoRA, a novel framework for parameter-efficient training of MTL models. MTLoRA employs Task-Agnostic and Task-Specific Low-Rank Adaptation modules, which effectively disentangle the parameter space in MTL fine-tuning, thereby enabling the model to adeptly handle both task specialization and interaction within MTL contexts. We applied MTLoRA to hierarchical-transformer-based MTL architectures, adapting them to multiple downstream dense prediction tasks. Our extensive experiments on the PASCAL dataset show that MTLoRA achieves higher accuracy on downstream tasks compared to fully fine-tuning the MTL model while reducing the number of trainable parameters by 3.6x. Furthermore, MTLoRA establishes a Pareto-optimal trade-off between the number of trainable parameters and the accuracy of the downstream tasks, outperforming current state-of-the-art parameter-efficient training methods in both accuracy and efficiency. Our code is publicly available.
Paper Structure (19 sections, 4 equations, 7 figures, 7 tables)

This paper contains 19 sections, 4 equations, 7 figures, 7 tables.

Figures (7)

  • Figure 1: MTLoRA versus state-of-the-art parameter-efficient training approaches using Swin-Tiny vision transformer as a backbone. r represents the different ranks for the low-rank decomposition modules inside MTLoRA.
  • Figure 2: (a) Individual Task Adaptation results in parallel execution paths for each task, resulting in inference and training time that scales linearly with the number of tasks. On the other hand, (b) Shared Multi-Task Adaptation maintains inference and training time close to the single task model since only the decoders are executed separately.
  • Figure 3: MTLoRA framework overview. Task-Agnostic LoRA modules (TA-LoRA) are placed at each transformer block, excluding the last ones in each stage where our Task-Specific LoRA (TS-LoRA) modules are placed to capture task-specific fine-tuning at different scales.
  • Figure 4: Accuracy versus trainable parameters of MTLoRA with task-agnostic vs task-specific adaptation modules.
  • Figure 5: Performance of MTLoRA on various downstream tasks when applied to a Swin-Base model pre-trained on ImageNet-22K
  • ...and 2 more figures