Table of Contents
Fetching ...

Towards Efficient Federated Learning of Networked Mixture-of-Experts for Mobile Edge Computing

Song Gao, Shusen Jing, Shuai Zhang, Yue Wang, Xiangwei Zhou, Songyang Zhang

TL;DR

Networked MoE (NMoE) tackles the challenge of deploying and training large foundation models on edge devices with limited compute and storage. It splits MoE components across edge clients and enables collaborative inference under privacy and bandwidth constraints. A three-stage federated training pipeline is proposed: Stage 1 federated training of the shared FE using FedCE or FedSC, Stage 2 personalized expert training on local data, and Stage 3 gating network training with RanGate, RollGate, or FedGate. Experiments on CIFAR10 under IID and non-IID settings show improved efficiency and robustness, with FedGate and FedSC delivering strong performance and reduced communication costs, underscoring the practical potential of NMoE for wireless management.

Abstract

Recent advancements in large artificial intelligence models (LAMs) are driving significant innovations in mobile edge computing within next-generation wireless networks. However, the substantial demands for computational resources and large-scale training data required to train LAMs conflict with the limited storage and computational capacity of edge devices, posing significant challenges to training and deploying LAMs at the edge. In this work, we introduce the Networked Mixture-of-Experts (NMoE) system, in which clients infer collaboratively by distributing tasks to suitable neighbors based on their expertise and aggregate the returned results. For training the NMoE, we propose a federated learning framework that integrates both supervised and self-supervised learning to balance personalization and generalization, while preserving communication efficiency and data privacy. We conduct extensive experiments to demonstrate the efficacy of the proposed NMoE system, providing insights and benchmarks for the NMoE training algorithms.

Towards Efficient Federated Learning of Networked Mixture-of-Experts for Mobile Edge Computing

TL;DR

Networked MoE (NMoE) tackles the challenge of deploying and training large foundation models on edge devices with limited compute and storage. It splits MoE components across edge clients and enables collaborative inference under privacy and bandwidth constraints. A three-stage federated training pipeline is proposed: Stage 1 federated training of the shared FE using FedCE or FedSC, Stage 2 personalized expert training on local data, and Stage 3 gating network training with RanGate, RollGate, or FedGate. Experiments on CIFAR10 under IID and non-IID settings show improved efficiency and robustness, with FedGate and FedSC delivering strong performance and reduced communication costs, underscoring the practical potential of NMoE for wireless management.

Abstract

Recent advancements in large artificial intelligence models (LAMs) are driving significant innovations in mobile edge computing within next-generation wireless networks. However, the substantial demands for computational resources and large-scale training data required to train LAMs conflict with the limited storage and computational capacity of edge devices, posing significant challenges to training and deploying LAMs at the edge. In this work, we introduce the Networked Mixture-of-Experts (NMoE) system, in which clients infer collaboratively by distributing tasks to suitable neighbors based on their expertise and aggregate the returned results. For training the NMoE, we propose a federated learning framework that integrates both supervised and self-supervised learning to balance personalization and generalization, while preserving communication efficiency and data privacy. We conduct extensive experiments to demonstrate the efficacy of the proposed NMoE system, providing insights and benchmarks for the NMoE training algorithms.

Paper Structure

This paper contains 19 sections, 8 equations, 4 figures, 2 tables.

Figures (4)

  • Figure 1: An overview of centralized MoE (left) and NMoE (right). The traditional MoE requires all components in the same edge device. In the proposed NMoE, each client locally deploys a cross-shared feature extractor, a cross-shared gating network, and a personalized expert. During inference, a client first processes its input data through FE to obtain latent features. These features are then passed to the gating network, which determines the most suitable experts (either local or from neighboring clients) for handling the data. The client subsequently distributes the extracted features according to the gating decisions, enabling selected experts to perform specialized inference.
  • Figure 2: Overview of Federated Training of NMoE ($F$, $G$, $H_i$, $a$ represents Feature Extractor, Gating Network, Personalized Expert, Aggregation Function, respectively): 1) Stage 1. The shared feature extractor is trained by federated self-supervised learning; 2) Stage 2. A local expert is trained with the fixed feature extractor using local data; 3) Stage 3. FedGate is trained with the fixed $F$ and $H_i$ using federated learning.
  • Figure 3: Gating Pattern (LR refers to average locally processing ratio): a) FedSC with F1=54.79% and LR=25%; b) Centralized MoE with F1=90.25% and LR=8%, and c) Local classifier with F1=6.55% and LR=100%. "Client ID" refers to data initialization location; "Expert ID" refers to selected expert.
  • Figure 4: Ablation Study.