GRAPHMOE: Amplifying Cognitive Depth of Mixture-of-Experts Network via Introducing Self-Rethinking Mechanism
Bo Lv, Chen Tang, Zifan Zheng, Bohao Yang, Kun Zhao, Ning Liao, Xiaoxing Wang, Feiyu Xiong, Zhiyu Li, Nayu Liu, Jingchi Jiang
TL;DR
GraphMoE introduces a self-rethinking mechanism that interconnects MoE expert nodes with a recurrent routing process on a pseudo-graph, enabling iterative refinement of representations. Implemented with LoRA-based adapters, it achieves state-of-the-art results across multiple commonsense benchmarks, surpassing existing LoRA+MoE baselines. The approach emphasizes balanced expert collaboration and controlled complexity, uncovering a path toward more powerful reasoning in language models. Overall, the work demonstrates that graph-based, multi-round routing can enhance cognitive depth in MoE architectures with modest parameter overhead, inviting further exploration of iterative, graph-guided MoE designs.
Abstract
Traditional Mixture-of-Experts (MoE) networks benefit from utilizing multiple smaller expert models as opposed to a single large network. However, these experts typically operate independently, leaving a question open about whether interconnecting these models could enhance the performance of MoE networks. In response, we introduce GRAPHMOE, a novel method aimed at augmenting the cognitive depth of language models via a self-rethinking mechanism constructed on Pseudo GraphMoE networks. GRAPHMOE employs a recurrent routing strategy to simulate iterative thinking steps, thereby facilitating the flow of information among expert nodes. We implement the GRAPHMOE architecture using Low-Rank Adaptation techniques (LoRA) and conduct extensive experiments on various benchmark datasets. The experimental results reveal that GRAPHMOE outperforms other LoRA based models, achieving state-of-the-art (SOTA) performance. Additionally, this study explores a novel recurrent routing strategy that may inspire further advancements in enhancing the reasoning capabilities of language models.
