Table of Contents
Fetching ...

dFLMoE: Decentralized Federated Learning via Mixture of Experts for Medical Data Analysis

Luyuan Xie, Tianyu Luan, Wenyuan Cai, Guochen Yan, Zhaoyu Chen, Nan Xi, Yuejian Fang, Qingni Shen, Zhonghai Wu, Junsong Yuan

TL;DR

dFLMoE tackles the privacy-preserving collaboration challenge in medical data by removing centralized aggregation and server dependency. It introduces a decentralized framework where each client trains a local body and head, exchanges lightweight head models with peers, and employs a client-specific Mixture of Experts with a feature-space transform and cross-attention to fuse knowledge. The method supports both homogeneous and heterogeneous client models and demonstrates robustness to network disruptions while maintaining high performance on five non-IID medical tasks. These results highlight the practical potential of decentralized MoE-based fusion for privacy-preserving, robust medical analytics.

Abstract

Federated learning has wide applications in the medical field. It enables knowledge sharing among different healthcare institutes while protecting patients' privacy. However, existing federated learning systems are typically centralized, requiring clients to upload client-specific knowledge to a central server for aggregation. This centralized approach would integrate the knowledge from each client into a centralized server, and the knowledge would be already undermined during the centralized integration before it reaches back to each client. Besides, the centralized approach also creates a dependency on the central server, which may affect training stability if the server malfunctions or connections are unstable. To address these issues, we propose a decentralized federated learning framework named dFLMoE. In our framework, clients directly exchange lightweight head models with each other. After exchanging, each client treats both local and received head models as individual experts, and utilizes a client-specific Mixture of Experts (MoE) approach to make collective decisions. This design not only reduces the knowledge damage with client-specific aggregations but also removes the dependency on the central server to enhance the robustness of the framework. We validate our framework on multiple medical tasks, demonstrating that our method evidently outperforms state-of-the-art approaches under both model homogeneity and heterogeneity settings.

dFLMoE: Decentralized Federated Learning via Mixture of Experts for Medical Data Analysis

TL;DR

dFLMoE tackles the privacy-preserving collaboration challenge in medical data by removing centralized aggregation and server dependency. It introduces a decentralized framework where each client trains a local body and head, exchanges lightweight head models with peers, and employs a client-specific Mixture of Experts with a feature-space transform and cross-attention to fuse knowledge. The method supports both homogeneous and heterogeneous client models and demonstrates robustness to network disruptions while maintaining high performance on five non-IID medical tasks. These results highlight the practical potential of decentralized MoE-based fusion for privacy-preserving, robust medical analytics.

Abstract

Federated learning has wide applications in the medical field. It enables knowledge sharing among different healthcare institutes while protecting patients' privacy. However, existing federated learning systems are typically centralized, requiring clients to upload client-specific knowledge to a central server for aggregation. This centralized approach would integrate the knowledge from each client into a centralized server, and the knowledge would be already undermined during the centralized integration before it reaches back to each client. Besides, the centralized approach also creates a dependency on the central server, which may affect training stability if the server malfunctions or connections are unstable. To address these issues, we propose a decentralized federated learning framework named dFLMoE. In our framework, clients directly exchange lightweight head models with each other. After exchanging, each client treats both local and received head models as individual experts, and utilizes a client-specific Mixture of Experts (MoE) approach to make collective decisions. This design not only reduces the knowledge damage with client-specific aggregations but also removes the dependency on the central server to enhance the robustness of the framework. We validate our framework on multiple medical tasks, demonstrating that our method evidently outperforms state-of-the-art approaches under both model homogeneity and heterogeneity settings.

Paper Structure

This paper contains 19 sections, 12 equations, 5 figures, 10 tables.

Figures (5)

  • Figure 1: (a) Previous centralized federated learning framework aggregates knowledge from each client in a centralized server. This process can lead to knowledge damage in centralized aggregation and the framework is heavily dependent on the central server's stability. (a) Our decentralized framework dFLMoE eliminates centralized server and aggregation by having clients directly exchange knowledge with each other. Each client then uses a Mixture of Experts (MoE) approach to adaptively combine local and received knowledge.
  • Figure 2: Overview of our proposed dFLMoE framework. For Each training phase, we first train the Local network (Body and Head) while freezing the parameters of the MoE module (top right). Then, we send and receive the head to share knowledge among clients (bottom). Finally, we do a Mixture-of-Experts (MoE) decision by training the Feature space transform and MoE network while freezing other parameters including the local body and all the heads. More details can be found in the \ref{['sec:method']}.
  • Figure 3: The structure of Mixture of Experts and Feature Space Transform. Firstly, the Feature Space Transform converts the local body feature into the feature space corresponding to each expert. Then, each feature obtains the final prediction through the respective expert, and we collect all predictions as the Key $K$ and Value $V$. Next, we generate the query $Q$ using the local body feature through a linear layer. Finally, we perform the attention mechanism with $Q$, $K$, and $V$ to obtain the final predictions.
  • Figure 4: Visualized comparison of Federated Learning in medical image super-resolution. We randomly select two samples from different resolutions (x8↓ and x4↓) to form the visualization. Super-resolution results for FedAVG, SCAFFOLD, FedProx, LG-FedAvg, FedRep, our method dFLMoE (RCNN) and dFLMoE (SRResNet). Our framework can recover more details.
  • Figure 5: Visualized comparison of Federated Learning in medical image segmentation. We randomly select three samples from different clients to form the visualization. (a-k) Segmentation results for FedAVG, SCAFFOLD, FedProx, Ditto, APFL, LG-FedAvg, FedRep, FedSM, LC-Fed, MH-FLID and our method dFLMoE; (l) Ground truths (denoted as ‘GT’).