Table of Contents
Fetching ...

Federated Distillation for Medical Image Classification: Towards Trustworthy Computer-Aided Diagnosis

Sufen Ren, Yule Hu, Shengchao Chen, Guanjun Wang

TL;DR

This work tackles privacy-preserving medical image classification in federated settings with severe data heterogeneity and limited resources. It introduces FedMIC, which combines Dual Knowledge Distillation (local teacher-student learning with representation- and decision-level distillation) and Global Parameter Decomposition (low-rank parameter updates with dynamic singular-value selection) to deliver personalized models while minimizing data transfer. Theoretical guarantees are provided via a generalization bound in a distributed setting, and extensive experiments on four public MIC datasets demonstrate that FedMIC outperforms state-of-the-art FL baselines, especially under non-IID distributions and low client participation. The approach enables trustworthy computer-aided diagnosis in resource-constrained healthcare environments by reducing communication overhead and preserving data privacy without sacrificing accuracy.

Abstract

Medical image classification plays a crucial role in computer-aided clinical diagnosis. While deep learning techniques have significantly enhanced efficiency and reduced costs, the privacy-sensitive nature of medical imaging data complicates centralized storage and model training. Furthermore, low-resource healthcare organizations face challenges related to communication overhead and efficiency due to increasing data and model scales. This paper proposes a novel privacy-preserving medical image classification framework based on federated learning to address these issues, named FedMIC. The framework enables healthcare organizations to learn from both global and local knowledge, enhancing local representation of private data despite statistical heterogeneity. It provides customized models for organizations with diverse data distributions while minimizing communication overhead and improving efficiency without compromising performance. Our FedMIC enhances robustness and practical applicability under resource-constrained conditions. We demonstrate FedMIC's effectiveness using four public medical image datasets for classical medical image classification tasks.

Federated Distillation for Medical Image Classification: Towards Trustworthy Computer-Aided Diagnosis

TL;DR

This work tackles privacy-preserving medical image classification in federated settings with severe data heterogeneity and limited resources. It introduces FedMIC, which combines Dual Knowledge Distillation (local teacher-student learning with representation- and decision-level distillation) and Global Parameter Decomposition (low-rank parameter updates with dynamic singular-value selection) to deliver personalized models while minimizing data transfer. Theoretical guarantees are provided via a generalization bound in a distributed setting, and extensive experiments on four public MIC datasets demonstrate that FedMIC outperforms state-of-the-art FL baselines, especially under non-IID distributions and low client participation. The approach enables trustworthy computer-aided diagnosis in resource-constrained healthcare environments by reducing communication overhead and preserving data privacy without sacrificing accuracy.

Abstract

Medical image classification plays a crucial role in computer-aided clinical diagnosis. While deep learning techniques have significantly enhanced efficiency and reduced costs, the privacy-sensitive nature of medical imaging data complicates centralized storage and model training. Furthermore, low-resource healthcare organizations face challenges related to communication overhead and efficiency due to increasing data and model scales. This paper proposes a novel privacy-preserving medical image classification framework based on federated learning to address these issues, named FedMIC. The framework enables healthcare organizations to learn from both global and local knowledge, enhancing local representation of private data despite statistical heterogeneity. It provides customized models for organizations with diverse data distributions while minimizing communication overhead and improving efficiency without compromising performance. Our FedMIC enhances robustness and practical applicability under resource-constrained conditions. We demonstrate FedMIC's effectiveness using four public medical image datasets for classical medical image classification tasks.
Paper Structure (31 sections, 1 theorem, 24 equations, 3 figures, 9 tables, 2 algorithms)

This paper contains 31 sections, 1 theorem, 24 equations, 3 figures, 9 tables, 2 algorithms.

Key Result

Theorem 1

Consider a on-device medical image classification system with $m$ clients (devices). Let ${\mathcal{D}}_1, {\mathcal{D}}_2, ..., {\mathcal{D}}_m$ be the true data distribution and $\hat{{\mathcal{D}}_1}, \hat{{\mathcal{D}}_2}, ... , \hat{{\mathcal{D}}_m}$ be the empirical data distribution. Denote t

Figures (3)

  • Figure 1: Schematic diagram of our FedMIC. Each healthcare organization as an independent client with private medical images that remain unshared throughout the training process. Clients train their local models using exclusively local data before transmitting parameters to the central server. FedMIC significantly reduces communication overhead by transmitting only a small subset of parameters from the decomposition matrix, rather than the entire local model. The server reconstructs and aggregates these uploaded parameters within its service area, subsequently broadcasting the aggregated parameters to all clients for the next iteration of training and communication.
  • Figure 2: Schematic diagram of the process of local updating within our FedMIC. The local update comprises two primary phases: (1) local model updating and (2) parameter flow of the updated local model. In phase one, duplicate local medical image data are input into both student and teacher models. The extracted representations are used to compute the representational distillation loss, facilitating bi-directional correction. These representations are then fed into the corresponding model heads for decision-making, where the decision distillation loss is calculated using two independent auxiliary matrices. This process incorporates both the model's classification loss and the decision distillation loss between student and teacher models. In phase two, the trained student model's parameters undergo low-rank decomposition and matrix decomposition, resulting in a decomposed parameter matrix that is uploaded to the server. The reconstruction process involves the inverse processing of these parameters, which are subsequently broadcast from the server to the client.
  • Figure 3: Visualization of four datasets distribution in different degrees of Non-IID environments. From top to bottom are BloodMNIST, TissueMNIST, OrganMNIST (2D) and OrganMNIST (2D), and from left to right are the Dirichlet distribution parameters $\lambda \in \{0.1, 0.3, 0.5\}$. The smaller $\lambda$ is, the stronger the data Non-IID between clients. In addition, a larger red circle means that the client has more such samples, and the opposite means that there are fewer such samples on the client.

Theorems & Definitions (1)

  • Theorem 1: Generalization Bound of FedMIC