MoORE: SVD-based Model MoE-ization for Conflict- and Oblivion-Resistant Multi-Task Adaptation

Shen Yuan; Yin Zheng; Taifeng Wang; Binbin Liu; Hongteng Xu

MoORE: SVD-based Model MoE-ization for Conflict- and Oblivion-Resistant Multi-Task Adaptation

Shen Yuan, Yin Zheng, Taifeng Wang, Binbin Liu, Hongteng Xu

TL;DR

MoORE introduces a principled SVD-based model MoE-ization that converts a pre-trained weight matrix into a complete mixture of orthogonal rank-one experts, enabling conflict- and oblivion-resistant multi-task adaptation. By decomposing $W$ as $W = U \operatorname{diag}(\boldsymbol{\sigma}) V^{\top}$ and treating each rank-one term $\mathbf{u}_d \mathbf{v}_d^{\top}$ as an expert, MoORE adds a learnable router combining task- and sample-level cues and couples it with a learnable orthogonal adapter (Householder-based) to boost capacity while preserving the original column space $\text{Range}(W)$. This design yields orthogonal, non-redundant experts and maintains pre-training capabilities, reducing interference across tasks and forgetting of prior tasks. Experiments on CSR-MTL, NLU-MTL, and OR-MTL show that MoORE improves conflict- and oblivion-resistance and achieves competitive inference efficiency versus baselines like LoRA- and MixLoRA-based MoEs. Overall, MoORE provides a scalable, intrinsic MoE formulation for multi-task adaptation with strong empirical gains and practical efficiency.

Abstract

Adapting large-scale foundation models in multi-task scenarios often suffers from task conflict and oblivion. To mitigate such issues, we propose a novel ''model MoE-ization'' strategy that leads to a conflict- and oblivion-resistant multi-task adaptation method. Given a weight matrix of a pre-trained model, our method applies SVD to it and introduces a learnable router to adjust its singular values based on tasks and samples. Accordingly, the weight matrix becomes a Mixture of Orthogonal Rank-one Experts (MoORE), in which each expert corresponds to the outer product of a left singular vector and the corresponding right one. We can improve the model capacity by imposing a learnable orthogonal transform on the right singular vectors. Unlike low-rank adaptation (LoRA) and its MoE-driven variants, MoORE guarantees the experts' orthogonality and maintains the column space of the original weight matrix. These two properties make the adapted model resistant to the conflicts among the new tasks and the oblivion of its original tasks, respectively. Experiments on various datasets demonstrate that MoORE outperforms existing multi-task adaptation methods consistently, showing its superiority in terms of conflict- and oblivion-resistance. The code of the experiments is available at https://github.com/DaShenZi721/MoORE.

MoORE: SVD-based Model MoE-ization for Conflict- and Oblivion-Resistant Multi-Task Adaptation

TL;DR

Abstract

MoORE: SVD-based Model MoE-ization for Conflict- and Oblivion-Resistant Multi-Task Adaptation

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (6)