$\texttt{MoE-RBench}$: Towards Building Reliable Language Models with Sparse Mixture-of-Experts
Guanjie Chen, Xinyu Zhao, Tianlong Chen, Yu Cheng
TL;DR
MoE-RBench introduces a comprehensive reliability benchmark for Sparse Mixture-of-Experts models, evaluating safety, hallucination, adversarial robustness, and OOD performance. By analyzing multiple MoE architectures and training/inference strategies, the work demonstrates that with appropriate router tuning, data augmentation, and decoding techniques, MoE models can achieve reliability on par with or surpass dense LLMs, especially under adversarial and distribution-shift conditions. The study also highlights that routing dynamics, expert dropout, and load-balancing losses substantially influence robustness, offering practical guidance for deploying MoE in high-security tasks. Overall, MoE-RBench provides actionable insights and datasets to advance trustworthy MoE-based language models in real-world settings.
Abstract
Mixture-of-Experts (MoE) has gained increasing popularity as a promising framework for scaling up large language models (LLMs). However, the reliability assessment of MoE lags behind its surging applications. Moreover, when transferred to new domains such as in fine-tuning MoE models sometimes underperform their dense counterparts. Motivated by the research gap and counter-intuitive phenomenon, we propose $\texttt{MoE-RBench}$, the first comprehensive assessment of SMoE reliability from three aspects: $\textit{(i)}$ safety and hallucination, $\textit{(ii)}$ resilience to adversarial attacks, and $\textit{(iii)}$ out-of-distribution robustness. Extensive models and datasets are tested to compare the MoE to dense networks from these reliability dimensions. Our empirical observations suggest that with appropriate hyperparameters, training recipes, and inference techniques, we can build the MoE model more reliably than the dense LLM. In particular, we find that the robustness of SMoE is sensitive to the basic training settings. We hope that this study can provide deeper insights into how to adapt the pre-trained MoE model to other tasks with higher-generation security, quality, and stability. Codes are available at https://github.com/UNITES-Lab/MoE-RBench
