Table of Contents
Fetching ...

PM-MOE: Mixture of Experts on Private Model Parameters for Personalized Federated Learning

Yu Feng, Yangli-ao Geng, Yifan Zhu, Zongfu Han, Xie Yu, Kaiwen Xue, Haoran Luo, Mengyang Sun, Guangwei Zhang, Meina Song

TL;DR

PM-MOE tackles data heterogeneity in federated learning by introducing a two-stage framework that combines a mixture of personalized modules (MPM) with an energy-based denoising mechanism (EDM). The approach first pre-trains global and personalized components, then uses a gating network to selectively fuse beneficial cross-client personalized parameters, discarding noisy contributions via EDM. Theoretical guarantees show convergence to a lower bound, and extensive experiments demonstrate consistent improvements over nine state-of-the-art PFL methods across six datasets and two heterogeneity settings, with modest additional training cost. The work highlights the practical potential of cross-client personalization sharing while preserving privacy, and provides insights into gating design and parameter selection for MoE in PFL contexts.

Abstract

Federated learning (FL) has gained widespread attention for its privacy-preserving and collaborative learning capabilities. Due to significant statistical heterogeneity, traditional FL struggles to generalize a shared model across diverse data domains. Personalized federated learning addresses this issue by dividing the model into a globally shared part and a locally private part, with the local model correcting representation biases introduced by the global model. Nevertheless, locally converged parameters more accurately capture domain-specific knowledge, and current methods overlook the potential benefits of these parameters. To address these limitations, we propose PM-MoE architecture. This architecture integrates a mixture of personalized modules and an energy-based personalized modules denoising, enabling each client to select beneficial personalized parameters from other clients. We applied the PM-MoE architecture to nine recent model-split-based personalized federated learning algorithms, achieving performance improvements with minimal additional training. Extensive experiments on six widely adopted datasets and two heterogeneity settings validate the effectiveness of our approach. The source code is available at \url{https://github.com/dannis97500/PM-MOE}.

PM-MOE: Mixture of Experts on Private Model Parameters for Personalized Federated Learning

TL;DR

PM-MOE tackles data heterogeneity in federated learning by introducing a two-stage framework that combines a mixture of personalized modules (MPM) with an energy-based denoising mechanism (EDM). The approach first pre-trains global and personalized components, then uses a gating network to selectively fuse beneficial cross-client personalized parameters, discarding noisy contributions via EDM. Theoretical guarantees show convergence to a lower bound, and extensive experiments demonstrate consistent improvements over nine state-of-the-art PFL methods across six datasets and two heterogeneity settings, with modest additional training cost. The work highlights the practical potential of cross-client personalization sharing while preserving privacy, and provides insights into gating design and parameter selection for MoE in PFL contexts.

Abstract

Federated learning (FL) has gained widespread attention for its privacy-preserving and collaborative learning capabilities. Due to significant statistical heterogeneity, traditional FL struggles to generalize a shared model across diverse data domains. Personalized federated learning addresses this issue by dividing the model into a globally shared part and a locally private part, with the local model correcting representation biases introduced by the global model. Nevertheless, locally converged parameters more accurately capture domain-specific knowledge, and current methods overlook the potential benefits of these parameters. To address these limitations, we propose PM-MoE architecture. This architecture integrates a mixture of personalized modules and an energy-based personalized modules denoising, enabling each client to select beneficial personalized parameters from other clients. We applied the PM-MoE architecture to nine recent model-split-based personalized federated learning algorithms, achieving performance improvements with minimal additional training. Extensive experiments on six widely adopted datasets and two heterogeneity settings validate the effectiveness of our approach. The source code is available at \url{https://github.com/dannis97500/PM-MOE}.

Paper Structure

This paper contains 35 sections, 1 theorem, 15 equations, 6 figures, 4 tables, 1 algorithm.

Key Result

theorem 1

(Lower Bound on the Final Accuracy of MPE) Suppose there are $M (\geq 2)$ client experts predicting independently, each with an average accuracy rate of $p (>0)$. If a trained gate network assigns samples to the client experts such that the ratio of the probability of assigning a sample to a correct

Figures (6)

  • Figure 1: Motivation of our study. (A) t-SNE graph shows the inference effects of different models on the same set of data. (B) Client A gets closer to the target when using Client B's personalized model, but moves farther from the target when using Client C's personalized model.
  • Figure 2: Overall Architecture of Personalized Model parameters with Mixture of Experts
  • Figure 3: Diagram of Mixture of Personalized Parameters.
  • Figure 4: Diagram of Mixture of Personalized Experts.
  • Figure 5: Results of Gating Network Parameters
  • ...and 1 more figures

Theorems & Definitions (1)

  • theorem 1