GMoE: Empowering LLMs Fine-Tuning via MoE Graph Collaboration
Ting Bai, Yue Yu, Le Huang, Zenan Xu, Chuan Shi
TL;DR
This work tackles instability and load imbalance in sparse Mixture-of-Experts (MoE) during large-language-model fine-tuning by introducing GMoE, a graph-based MoE with a graph router that enables explicit cross-expert collaboration. It incorporates two coordination strategies—a Poisson distribution-based distinction to promote expert specialization and a Normal distribution-based balance to regulate workload—implemented within a parameter-efficient Fine-Tuning framework using LoRA. Empirical results on four real-world benchmarks across multiple base LLMs show that GMoE achieves state-of-the-art accuracy with improved stability (lower Std) while using fewer trainable parameters, thanks to the graph-empowered routing and efficient LoRA updates. This graph-based MoE framework offers a scalable, communication-enabled alternative to conventional router designs, with practical implications for stable, efficient fine-tuning of LLMs.
Abstract
The sparse Mixture-of-Experts (MoE) architecture of large language models (LLMs) confronts an inherent issue of load imbalance arising from the simplistic linear router strategy, which ultimately causes the instability and inefficient learning of LLMs. To address this challenge, we introduce a novel MoE graph-based framework $\textbf{GMoE}$, aimed at enhancing the collaboration among multiple experts. In GMoE, a graph router function is designed to capture the collaboration signals among experts. This enables all experts to dynamically allocate information derived from input data by sharing information with their neighboring experts. Moreover, we put forward two coordination strategies in GMoE: the $\textit{Poisson distribution-based distinction strategy}$ and the $\textit{Normal distribution-based balance strategy}$, to further release the capacity of each expert and increase the model stability in the fine-tuning of LLMs. Specifically, we leverage a parameter-efficient fine-tuning technique, i.e., Low-Rank Adaptation (LoRA), to implement the graph MoE architecture. Extensive experiments on four real-world benchmark datasets demonstrate the effectiveness of GMoE, showing the benefits of facilitating collaborations of multiple experts in LLM fine-tuning. The code of experimental implementation is available at https://github.com/BAI-LAB/GMoE
