Towards Efficient Federated Learning of Networked Mixture-of-Experts for Mobile Edge Computing
Song Gao, Shusen Jing, Shuai Zhang, Yue Wang, Xiangwei Zhou, Songyang Zhang
TL;DR
Networked MoE (NMoE) tackles the challenge of deploying and training large foundation models on edge devices with limited compute and storage. It splits MoE components across edge clients and enables collaborative inference under privacy and bandwidth constraints. A three-stage federated training pipeline is proposed: Stage 1 federated training of the shared FE using FedCE or FedSC, Stage 2 personalized expert training on local data, and Stage 3 gating network training with RanGate, RollGate, or FedGate. Experiments on CIFAR10 under IID and non-IID settings show improved efficiency and robustness, with FedGate and FedSC delivering strong performance and reduced communication costs, underscoring the practical potential of NMoE for wireless management.
Abstract
Recent advancements in large artificial intelligence models (LAMs) are driving significant innovations in mobile edge computing within next-generation wireless networks. However, the substantial demands for computational resources and large-scale training data required to train LAMs conflict with the limited storage and computational capacity of edge devices, posing significant challenges to training and deploying LAMs at the edge. In this work, we introduce the Networked Mixture-of-Experts (NMoE) system, in which clients infer collaboratively by distributing tasks to suitable neighbors based on their expertise and aggregate the returned results. For training the NMoE, we propose a federated learning framework that integrates both supervised and self-supervised learning to balance personalization and generalization, while preserving communication efficiency and data privacy. We conduct extensive experiments to demonstrate the efficacy of the proposed NMoE system, providing insights and benchmarks for the NMoE training algorithms.
