Table of Contents
Fetching ...

MH-pFLID: Model Heterogeneous personalized Federated Learning via Injection and Distillation for Medical Data Analysis

Luyuan Xie, Manqing Lin, Tianyu Luan, Cong Li, Yuejian Fang, Qingni Shen, Zhonghai Wu

TL;DR

MH-pFLID tackles model-heterogeneous federated learning for medical data under non-IID distributions without relying on public datasets. It introduces a lightweight messenger model with injection and distillation, and dedicated receiver/transmitter modules to transfer knowledge efficiently between heterogeneous local models and the server. The framework demonstrates superior performance across medical image classification, segmentation, and time-series tasks, with low messenger overhead and strong generalizability to unseen clients. This approach enables practical, private, and scalable personalized FL for diverse medical institutions with varying hardware and architectures.

Abstract

Federated learning is widely used in medical applications for training global models without needing local data access. However, varying computational capabilities and network architectures (system heterogeneity), across clients pose significant challenges in effectively aggregating information from non-independently and identically distributed (non-IID) data. Current federated learning methods using knowledge distillation require public datasets, raising privacy and data collection issues. Additionally, these datasets require additional local computing and storage resources, which is a burden for medical institutions with limited hardware conditions. In this paper, we introduce a novel federated learning paradigm, named Model Heterogeneous personalized Federated Learning via Injection and Distillation (MH-pFLID). Our framework leverages a lightweight messenger model that carries concentrated information to collect the information from each client. We also develop a set of receiver and transmitter modules to receive and send information from the messenger model, so that the information could be injected and distilled with efficiency.

MH-pFLID: Model Heterogeneous personalized Federated Learning via Injection and Distillation for Medical Data Analysis

TL;DR

MH-pFLID tackles model-heterogeneous federated learning for medical data under non-IID distributions without relying on public datasets. It introduces a lightweight messenger model with injection and distillation, and dedicated receiver/transmitter modules to transfer knowledge efficiently between heterogeneous local models and the server. The framework demonstrates superior performance across medical image classification, segmentation, and time-series tasks, with low messenger overhead and strong generalizability to unseen clients. This approach enables practical, private, and scalable personalized FL for diverse medical institutions with varying hardware and architectures.

Abstract

Federated learning is widely used in medical applications for training global models without needing local data access. However, varying computational capabilities and network architectures (system heterogeneity), across clients pose significant challenges in effectively aggregating information from non-independently and identically distributed (non-IID) data. Current federated learning methods using knowledge distillation require public datasets, raising privacy and data collection issues. Additionally, these datasets require additional local computing and storage resources, which is a burden for medical institutions with limited hardware conditions. In this paper, we introduce a novel federated learning paradigm, named Model Heterogeneous personalized Federated Learning via Injection and Distillation (MH-pFLID). Our framework leverages a lightweight messenger model that carries concentrated information to collect the information from each client. We also develop a set of receiver and transmitter modules to receive and send information from the messenger model, so that the information could be injected and distilled with efficiency.
Paper Structure (31 sections, 15 equations, 5 figures, 10 tables)

This paper contains 31 sections, 15 equations, 5 figures, 10 tables.

Figures (5)

  • Figure 1: (a) Previous methods such as KT-pFL NEURIPS2021_5383c731 require an extensive public dataset to generate soft predictions to carry information from local clients to the server. These methods are highly limited by the high cost and privacy concerns of medical public datasets in real applications. (b) Our new framework MH-pFLID does not require such a public dataset for training. We use a lightweight messenger model to carry and transform the information among model heterogeneous clients and the server.
  • Figure 2: Overview of our proposed MH-pFLID framework. Each training cycle consists of 5 steps. From 1 to 5: ① Knowledge injection stage. We design an Information Receiver module to utilize the aggregated information in Messenger to train the local model. ② Knowledge distillation stage. We design an Information Transmitter module to transmit the personalized information from the local model to the messenger. ③ Uploading the messenger parameters on each client to the server. ④ Messenger aggregation on the server using a weighted average strategy. ⑤ Downloading the aggregated messenger parameters to each server. More details can be found in \ref{['sec:pipeline']} and \ref{['sec:irit']}.
  • Figure 3: The structure of information receiver (a) and information transmitter (b).
  • Figure 4: Visualized comparison of Federated Learning in medical image segmentation. We randomly select four samples from different clients to form the visualization. (a-j) Segmentation results by a model trained with FedAVG, SCAFFOLD, FedProx, Ditto, APFL, LG-FedAvg, FedRep, FedSM, LC-Fed, and our method MH-pFLID; (k) Ground truths (denoted as ‘GT’).
  • Figure 5: t-SNE map of the 7th client (DenseNet) through injection & distillation or add under breast cancer classification task (different label distributions). Different colored dots represent different categories. (a-c) are the t-SNE map of (a) the features extracted from the local model body, (b) local model body features after injection & distillation, and (c) the same feature by replacing the injection & distillation process with a simple add operation and retrained. The experimental results show that the features generated by injection & distillation are more distinguishable compared to the straightforward add design.