Adaptive Shared Experts with LoRA-Based Mixture of Experts for Multi-Task Learning
Minghao Yang, Ren Togo, Guang Li, Takahiro Ogawa, Miki Haseyama
TL;DR
The paper tackles inefficiencies in MoE-based multi-task learning stemming from transferring from single-task backbones, which can cause redundant adaptation and gradient conflicts during the STL-to-MTL transition. It introduces Adaptive Shared Experts (ASE) within a LoRA-MoE framework, where sparse and shared experts are gated by router-derived weights and normalized jointly, with a dynamic balance that favors shared expertise early in training and gradually shifts toward task-specific specialization. It further augments the model with fine-grained LoRA experts to boost cooperation under a fixed parameter budget. Empirical results on PASCAL-Context show robust gains (e.g., Seg mIoU up to 74.0 and average $\Delta_m$ around +7.58%) with modest parameter overhead (~4%), validating the approach's effectiveness and scalability for multi-task vision models.
Abstract
Mixture-of-Experts (MoE) has emerged as a powerful framework for multi-task learning (MTL). However, existing MoE-MTL methods often rely on single-task pretrained backbones and suffer from redundant adaptation and inefficient knowledge sharing during the transition from single-task to multi-task learning (STL to MTL). To address these limitations, we propose adaptive shared experts (ASE) within a low-rank adaptation (LoRA) based MoE, where shared experts are assigned router-computed gating weights jointly normalized with sparse experts. This design facilitates STL to MTL transition, enhances expert specialization, and cooperation. Furthermore, we incorporate fine-grained experts by increasing the number of LoRA experts while proportionally reducing their rank, enabling more effective knowledge sharing under a comparable parameter budget. Extensive experiments on the PASCAL-Context benchmark, under unified training settings, demonstrate that ASE consistently improves performance across diverse configurations and validates the effectiveness of fine-grained designs for MTL.
