PM-MOE: Mixture of Experts on Private Model Parameters for Personalized Federated Learning
Yu Feng, Yangli-ao Geng, Yifan Zhu, Zongfu Han, Xie Yu, Kaiwen Xue, Haoran Luo, Mengyang Sun, Guangwei Zhang, Meina Song
TL;DR
PM-MOE tackles data heterogeneity in federated learning by introducing a two-stage framework that combines a mixture of personalized modules (MPM) with an energy-based denoising mechanism (EDM). The approach first pre-trains global and personalized components, then uses a gating network to selectively fuse beneficial cross-client personalized parameters, discarding noisy contributions via EDM. Theoretical guarantees show convergence to a lower bound, and extensive experiments demonstrate consistent improvements over nine state-of-the-art PFL methods across six datasets and two heterogeneity settings, with modest additional training cost. The work highlights the practical potential of cross-client personalization sharing while preserving privacy, and provides insights into gating design and parameter selection for MoE in PFL contexts.
Abstract
Federated learning (FL) has gained widespread attention for its privacy-preserving and collaborative learning capabilities. Due to significant statistical heterogeneity, traditional FL struggles to generalize a shared model across diverse data domains. Personalized federated learning addresses this issue by dividing the model into a globally shared part and a locally private part, with the local model correcting representation biases introduced by the global model. Nevertheless, locally converged parameters more accurately capture domain-specific knowledge, and current methods overlook the potential benefits of these parameters. To address these limitations, we propose PM-MoE architecture. This architecture integrates a mixture of personalized modules and an energy-based personalized modules denoising, enabling each client to select beneficial personalized parameters from other clients. We applied the PM-MoE architecture to nine recent model-split-based personalized federated learning algorithms, achieving performance improvements with minimal additional training. Extensive experiments on six widely adopted datasets and two heterogeneity settings validate the effectiveness of our approach. The source code is available at \url{https://github.com/dannis97500/PM-MOE}.
