MoA: Heterogeneous Mixture of Adapters for Parameter-Efficient Fine-Tuning of Large Language Models
Jie Cao, Tianwei Lin, Bo Yuan, Rolan Yan, Hongyang He, Wenqiao Zhang, Juncheng Li, Dongping Zhang, Siliang Tang, Yueting Zhuang
TL;DR
This work tackles the efficiency bottlenecks of parameter-efficient fine-tuning for large language models by addressing representation collapse and expert load imbalance in homogeneous MoE-LoRA designs. It introduces Mixture-of-Adapters (MoA), a heterogeneous ensemble of PEFT adapters with token-level routing, and two practical variants: Soft MoA (soft fusion via a sigmoid router) and Sparse MoA (learnable per-token thresholds for active experts, $\Gamma = \Gamma_{max}{\rm Sigmoid}(\boldsymbol{W}_{\Gamma}^{T}\boldsymbol{x} + \boldsymbol{b}_{\Gamma})$). The MoA framework assembles diverse adapters (including five LoRA modules, FFN Parallel Adapters, and a zero-initialized Prompt Tuning) to promote specialization and efficient knowledge transfer, achieving higher accuracy and better resource efficiency than state-of-the-art homogeneous MoE-LoRA baselines on math, commonsense, and code-generation tasks. Across multiple foundation models, Soft MoA and Sparse MoA demonstrate superior training efficiency, memory footprint, and inference latency while using far fewer trainable parameters, underscoring the practical impact of architectural heterogeneity in PEFT for LLMs.
Abstract
Recent studies integrate Low-Rank Adaptation (LoRA) and Mixture-of-Experts (MoE) to further enhance the performance of parameter-efficient fine-tuning (PEFT) methods in Large Language Model (LLM) applications. Existing methods employ \emph{homogeneous} MoE-LoRA architectures composed of LoRA experts with either similar or identical structures and capacities. However, these approaches often suffer from representation collapse and expert load imbalance, which negatively impact the potential of LLMs. To address these challenges, we propose a \emph{heterogeneous} \textbf{Mixture-of-Adapters (MoA)} approach. This method dynamically integrates PEFT adapter experts with diverse structures, leveraging their complementary representational capabilities to foster expert specialization, thereby enhancing the effective transfer of pre-trained knowledge to downstream tasks. MoA supports two variants: \textbf{(i)} \textit{Soft MoA} achieves fine-grained integration by performing a weighted fusion of all expert outputs; \textbf{(ii)} \textit{Sparse MoA} activates adapter experts sparsely based on their contribution, achieving this with negligible performance degradation. Experimental results demonstrate that heterogeneous MoA outperforms homogeneous MoE-LoRA methods in both performance and parameter efficiency. Our project is available at https://github.com/DCDmllm/MoA.
