EMS-FL: Federated Tuning of Mixture-of-Experts in Satellite-Terrestrial Networks via Expert-Driven Model Splitting

Angzi Xu; Zezhong Zhang; Zhi Liu; Shuguang Cui

EMS-FL: Federated Tuning of Mixture-of-Experts in Satellite-Terrestrial Networks via Expert-Driven Model Splitting

Angzi Xu, Zezhong Zhang, Zhi Liu, Shuguang Cui

TL;DR

EMS-FL, an expert-driven model splitting and federated learning method that effectively reduces the training overhead and achieves both faster convergence and higher accuracy compared with conventional federated learning is proposed.

Abstract

The rapid advancement of large AI models imposes stringent demands on data volume and computational resources. Federated learning, though designed to exploit distributed data and computational resources, faces data shortage from limited network coverage and computational constraints from edge devices. To address these issues, both the mixture-of-experts (MoE) and satellite-terrestrial network (STN) provide promising solutions, offering lightweight computation overhead and broad coverage, respectively. However, the satellite-ground relative motion results in intermittent connectivity, hindering conventional federated learning that relies on model synchronization across devices. To leverage the coverage of STN while preserving training efficiency, we propose EMS-FL, an expert-driven model splitting and federated learning method. EMS-FL assigns each device cluster only the experts highly correlated to their local data. Through non-overlapping expert assignments, asynchronous local learning is further proposed, where each device cluster trains its assigned experts consecutively and only uploads local parameters to the satellite during connected phases for aggregation and model updates. Consequently, EMS-FL effectively reduces the training overhead and achieves both faster convergence and higher accuracy compared with conventional federated learning. Rigorous convergence analysis is provided to theoretically characterize the learning performance. Furthermore, comprehensive experiments are conducted using public datasets and large models, validating the superiority of EMS-FL.

EMS-FL: Federated Tuning of Mixture-of-Experts in Satellite-Terrestrial Networks via Expert-Driven Model Splitting

TL;DR

Abstract

Paper Structure (19 sections, 2 theorems, 68 equations, 9 figures, 2 tables, 3 algorithms)

This paper contains 19 sections, 2 theorems, 68 equations, 9 figures, 2 tables, 3 algorithms.

Introduction
System Model
Learning Model
Federated Learning
MoE Fine-tuning via FL
Satellite-Ground Communication Model
EMS-FL for STN-Assisted MoE Fine-Tuning
Synchronous FL Baseline
EMS-FL for MoE
Enhanced EMS-FL
Convergence Analysis
Simulation Results
Experimental Settings
Experiment 1: MoE Fine-tuning with Top-1 Routing
Experiment 2: MoE Training with Top-2 Routing
...and 4 more sections

Key Result

Theorem 1

After $T$ orbital-cycle training with step-sizes $\eta^{\mathrm E} \le \frac{\eta^{\mathrm U}}{\gamma} \le \frac{1}{\sqrt{T}}$ in EMS-FL, the gradient variance is upper bounded by

Figures (9)

Figure 1: Illustration of large model tuning via FL under STN.
Figure 2: The architecture of transformer-based MoEXue2024WDMoE.
Figure 3: The instantaneous coverage of an LEO satellite.
Figure 4: Asynchronous federated learning across device clusters.
Figure 5: Test accuracy versus number of orbital cycles for EMS-FL and baseline scheme under different step-sizes.
...and 4 more figures

Theorems & Definitions (8)

Remark 1: Model Update with LoRA
Remark 2: Expert Relevant Probability
Remark 3: Global Model Update
Theorem 1: Convergence of EMS-FL
proof
Theorem 2: Convergence of Baseline
proof
Remark 4: Idle Connection Slots

EMS-FL: Federated Tuning of Mixture-of-Experts in Satellite-Terrestrial Networks via Expert-Driven Model Splitting

TL;DR

Abstract

EMS-FL: Federated Tuning of Mixture-of-Experts in Satellite-Terrestrial Networks via Expert-Driven Model Splitting

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (9)

Theorems & Definitions (8)