Table of Contents
Fetching ...

EMS-FL: Federated Tuning of Mixture-of-Experts in Satellite-Terrestrial Networks via Expert-Driven Model Splitting

Angzi Xu, Zezhong Zhang, Zhi Liu, Shuguang Cui

TL;DR

EMS-FL, an expert-driven model splitting and federated learning method that effectively reduces the training overhead and achieves both faster convergence and higher accuracy compared with conventional federated learning is proposed.

Abstract

The rapid advancement of large AI models imposes stringent demands on data volume and computational resources. Federated learning, though designed to exploit distributed data and computational resources, faces data shortage from limited network coverage and computational constraints from edge devices. To address these issues, both the mixture-of-experts (MoE) and satellite-terrestrial network (STN) provide promising solutions, offering lightweight computation overhead and broad coverage, respectively. However, the satellite-ground relative motion results in intermittent connectivity, hindering conventional federated learning that relies on model synchronization across devices. To leverage the coverage of STN while preserving training efficiency, we propose EMS-FL, an expert-driven model splitting and federated learning method. EMS-FL assigns each device cluster only the experts highly correlated to their local data. Through non-overlapping expert assignments, asynchronous local learning is further proposed, where each device cluster trains its assigned experts consecutively and only uploads local parameters to the satellite during connected phases for aggregation and model updates. Consequently, EMS-FL effectively reduces the training overhead and achieves both faster convergence and higher accuracy compared with conventional federated learning. Rigorous convergence analysis is provided to theoretically characterize the learning performance. Furthermore, comprehensive experiments are conducted using public datasets and large models, validating the superiority of EMS-FL.

EMS-FL: Federated Tuning of Mixture-of-Experts in Satellite-Terrestrial Networks via Expert-Driven Model Splitting

TL;DR

EMS-FL, an expert-driven model splitting and federated learning method that effectively reduces the training overhead and achieves both faster convergence and higher accuracy compared with conventional federated learning is proposed.

Abstract

The rapid advancement of large AI models imposes stringent demands on data volume and computational resources. Federated learning, though designed to exploit distributed data and computational resources, faces data shortage from limited network coverage and computational constraints from edge devices. To address these issues, both the mixture-of-experts (MoE) and satellite-terrestrial network (STN) provide promising solutions, offering lightweight computation overhead and broad coverage, respectively. However, the satellite-ground relative motion results in intermittent connectivity, hindering conventional federated learning that relies on model synchronization across devices. To leverage the coverage of STN while preserving training efficiency, we propose EMS-FL, an expert-driven model splitting and federated learning method. EMS-FL assigns each device cluster only the experts highly correlated to their local data. Through non-overlapping expert assignments, asynchronous local learning is further proposed, where each device cluster trains its assigned experts consecutively and only uploads local parameters to the satellite during connected phases for aggregation and model updates. Consequently, EMS-FL effectively reduces the training overhead and achieves both faster convergence and higher accuracy compared with conventional federated learning. Rigorous convergence analysis is provided to theoretically characterize the learning performance. Furthermore, comprehensive experiments are conducted using public datasets and large models, validating the superiority of EMS-FL.
Paper Structure (19 sections, 2 theorems, 68 equations, 9 figures, 2 tables, 3 algorithms)

This paper contains 19 sections, 2 theorems, 68 equations, 9 figures, 2 tables, 3 algorithms.

Key Result

Theorem 1

After $T$ orbital-cycle training with step-sizes $\eta^{\mathrm E} \le \frac{\eta^{\mathrm U}}{\gamma} \le \frac{1}{\sqrt{T}}$ in EMS-FL, the gradient variance is upper bounded by

Figures (9)

  • Figure 1: Illustration of large model tuning via FL under STN.
  • Figure 2: The architecture of transformer-based MoEXue2024WDMoE.
  • Figure 3: The instantaneous coverage of an LEO satellite.
  • Figure 4: Asynchronous federated learning across device clusters.
  • Figure 5: Test accuracy versus number of orbital cycles for EMS-FL and baseline scheme under different step-sizes.
  • ...and 4 more figures

Theorems & Definitions (8)

  • Remark 1: Model Update with LoRA
  • Remark 2: Expert Relevant Probability
  • Remark 3: Global Model Update
  • Theorem 1: Convergence of EMS-FL
  • proof
  • Theorem 2: Convergence of Baseline
  • proof
  • Remark 4: Idle Connection Slots