Table of Contents
Fetching ...

Multi-Task Dense Prediction via Mixture of Low-Rank Experts

Yuqi Yang, Peng-Tao Jiang, Qibin Hou, Hao Zhang, Jinwei Chen, Bo Li

TL;DR

MLoRE tackles decoder-focused dense multi-task prediction by explicitly modeling global task relations through a shared task-sharing convolution path and scaling capacity with low-rank, linear MoE experts. The three-path design—task-sharing generic, shared low-rank experts with per-task routing, and task-specific low-rank experts—enables both cross-task correlation and task discrimination while keeping parameters and FLOPs in check through linearity and re-parameterization at inference. Key contributions include formalizing a global-relations-aware MoE for decoders, introducing effective low-rank convolutions in the MoE, and demonstrating significant gains on PASCAL-Context and NYUD-v2 with efficient deployment. The approach achieves state-of-the-art results across multiple dense prediction tasks and offers practical advantages for scalable, decoder-focused multi-task learning in vision systems.

Abstract

Previous multi-task dense prediction methods based on the Mixture of Experts (MoE) have received great performance but they neglect the importance of explicitly modeling the global relations among all tasks. In this paper, we present a novel decoder-focused method for multi-task dense prediction, called Mixture-of-Low-Rank-Experts (MLoRE). To model the global task relationships, MLoRE adds a generic convolution path to the original MoE structure, where each task feature can go through this path for explicit parameter sharing. Furthermore, to control the parameters and computational cost brought by the increase in the number of experts, we take inspiration from LoRA and propose to leverage the low-rank format of a vanilla convolution in the expert network. Since the low-rank experts have fewer parameters and can be dynamically parameterized into the generic convolution, the parameters and computational cost do not change much with the increase of experts. Benefiting from this design, we increase the number of experts and its reception field to enlarge the representation capacity, facilitating multiple dense tasks learning in a unified network. Extensive experiments on the PASCAL-Context and NYUD-v2 benchmarks show that our MLoRE achieves superior performance compared to previous state-of-the-art methods on all metrics. Our code is available at https://github.com/YuqiYang213/MLoRE.

Multi-Task Dense Prediction via Mixture of Low-Rank Experts

TL;DR

MLoRE tackles decoder-focused dense multi-task prediction by explicitly modeling global task relations through a shared task-sharing convolution path and scaling capacity with low-rank, linear MoE experts. The three-path design—task-sharing generic, shared low-rank experts with per-task routing, and task-specific low-rank experts—enables both cross-task correlation and task discrimination while keeping parameters and FLOPs in check through linearity and re-parameterization at inference. Key contributions include formalizing a global-relations-aware MoE for decoders, introducing effective low-rank convolutions in the MoE, and demonstrating significant gains on PASCAL-Context and NYUD-v2 with efficient deployment. The approach achieves state-of-the-art results across multiple dense prediction tasks and offers practical advantages for scalable, decoder-focused multi-task learning in vision systems.

Abstract

Previous multi-task dense prediction methods based on the Mixture of Experts (MoE) have received great performance but they neglect the importance of explicitly modeling the global relations among all tasks. In this paper, we present a novel decoder-focused method for multi-task dense prediction, called Mixture-of-Low-Rank-Experts (MLoRE). To model the global task relationships, MLoRE adds a generic convolution path to the original MoE structure, where each task feature can go through this path for explicit parameter sharing. Furthermore, to control the parameters and computational cost brought by the increase in the number of experts, we take inspiration from LoRA and propose to leverage the low-rank format of a vanilla convolution in the expert network. Since the low-rank experts have fewer parameters and can be dynamically parameterized into the generic convolution, the parameters and computational cost do not change much with the increase of experts. Benefiting from this design, we increase the number of experts and its reception field to enlarge the representation capacity, facilitating multiple dense tasks learning in a unified network. Extensive experiments on the PASCAL-Context and NYUD-v2 benchmarks show that our MLoRE achieves superior performance compared to previous state-of-the-art methods on all metrics. Our code is available at https://github.com/YuqiYang213/MLoRE.
Paper Structure (29 sections, 5 equations, 11 figures, 14 tables)

This paper contains 29 sections, 5 equations, 11 figures, 14 tables.

Figures (11)

  • Figure 1: Performance comparison with state-of-the-art methods. Our MLoRE based on the proposed mixture of low-rank experts achieves superior performance on all tasks. $\uparrow$ denotes higher is better. $\downarrow$ denotes lower is better.
  • Figure 2: Overall framework of the proposed method. The MLoRE modules are equipped at different layers, where the backbone features from different layers are fed into the MLoRE modules, respectively. At each selected layer, the backbone feature is first projected to different task features and then sent to the task-sharing convolution, task-sharing low-rank expert networks followed by the task-specific router network and task-specific low-rank expert networks. The outputs of these branches are accumulated to generate task-specific features. At each selected layer, we stack two MLoRE modules.
  • Figure 3: Ablation study on the number of experts $N$ and the number of activated experts $K$. In the right figure, we also present the parameter change of the MLoRE module with the increase in the number of experts.
  • Figure 4: (a) The relations between tasks and low-rank experts. (b) The ratio of an expert activated by different numbers of tasks in the MLoRE module without the task-sharing generic path. We can see that without the task-sharing generic path, there is only a few experts can be activated by all five tasks. Horizontal coordinates represent the ranks of different experts.
  • Figure 5: Qualitative comparison among different methods, including InvPT ye2022inverted, TaskPrompter ye2022taskprompter, and ours. Best viewed with zoom-in. It can be seen that our method achieves better visual results than other methods on all five tasks thanks to the proposed MLoRE module.
  • ...and 6 more figures