Table of Contents
Fetching ...

Federated Learning for Medical Image Classification: A Comprehensive Benchmark

Zhekai Zhou, Guibo Luo, Mingzhi Chen, Zhenyu Weng, Yuesheng Zhu

TL;DR

This work benchmarks a broad set of federated learning algorithms on medical image classification using real multi-center datasets, revealing that no single method consistently wins across all scenarios. It introduces a concise yet effective augmentation strategy that combines conditional denoising diffusion probabilistic models with label smoothing to augment local data while preserving privacy, yielding performance close to centralized training on many tasks. The study demonstrates that diffusion-based augmentation reduces feature shift among clients, accelerates convergence, and improves final accuracy, suggesting practical paths for deploying FL in clinical imaging. The authors provide actionable guidelines for method selection based on data volume, computation, and communication constraints, and release code to support future research and benchmarking in medical FL.

Abstract

The federated learning paradigm is wellsuited for the field of medical image analysis, as it can effectively cope with machine learning on isolated multicenter data while protecting the privacy of participating parties. However, current research on optimization algorithms in federated learning often focuses on limited datasets and scenarios, primarily centered around natural images, with insufficient comparative experiments in medical contexts. In this work, we conduct a comprehensive evaluation of several state-of-the-art federated learning algorithms in the context of medical imaging. We conduct a fair comparison of classification models trained using various federated learning algorithms across multiple medical imaging datasets. Additionally, we evaluate system performance metrics, such as communication cost and computational efficiency, while considering different federated learning architectures. Our findings show that medical imaging datasets pose substantial challenges for current federated learning optimization algorithms. No single algorithm consistently delivers optimal performance across all medical federated learning scenarios, and many optimization algorithms may underperform when applied to these datasets. Our experiments provide a benchmark and guidance for future research and application of federated learning in medical imaging contexts. Furthermore, we propose an efficient and robust method that combines generative techniques using denoising diffusion probabilistic models with label smoothing to augment datasets, widely enhancing the performance of federated learning on classification tasks across various medical imaging datasets. Our code will be released on GitHub, offering a reliable and comprehensive benchmark for future federated learning studies in medical imaging.

Federated Learning for Medical Image Classification: A Comprehensive Benchmark

TL;DR

This work benchmarks a broad set of federated learning algorithms on medical image classification using real multi-center datasets, revealing that no single method consistently wins across all scenarios. It introduces a concise yet effective augmentation strategy that combines conditional denoising diffusion probabilistic models with label smoothing to augment local data while preserving privacy, yielding performance close to centralized training on many tasks. The study demonstrates that diffusion-based augmentation reduces feature shift among clients, accelerates convergence, and improves final accuracy, suggesting practical paths for deploying FL in clinical imaging. The authors provide actionable guidelines for method selection based on data volume, computation, and communication constraints, and release code to support future research and benchmarking in medical FL.

Abstract

The federated learning paradigm is wellsuited for the field of medical image analysis, as it can effectively cope with machine learning on isolated multicenter data while protecting the privacy of participating parties. However, current research on optimization algorithms in federated learning often focuses on limited datasets and scenarios, primarily centered around natural images, with insufficient comparative experiments in medical contexts. In this work, we conduct a comprehensive evaluation of several state-of-the-art federated learning algorithms in the context of medical imaging. We conduct a fair comparison of classification models trained using various federated learning algorithms across multiple medical imaging datasets. Additionally, we evaluate system performance metrics, such as communication cost and computational efficiency, while considering different federated learning architectures. Our findings show that medical imaging datasets pose substantial challenges for current federated learning optimization algorithms. No single algorithm consistently delivers optimal performance across all medical federated learning scenarios, and many optimization algorithms may underperform when applied to these datasets. Our experiments provide a benchmark and guidance for future research and application of federated learning in medical imaging contexts. Furthermore, we propose an efficient and robust method that combines generative techniques using denoising diffusion probabilistic models with label smoothing to augment datasets, widely enhancing the performance of federated learning on classification tasks across various medical imaging datasets. Our code will be released on GitHub, offering a reliable and comprehensive benchmark for future federated learning studies in medical imaging.

Paper Structure

This paper contains 29 sections, 7 equations, 7 figures, 10 tables.

Figures (7)

  • Figure 1: Architecture of our proposed federated learning optimization method. Our method incorporates the generative technique of DDPM and label smoothing within the FedAvg framework.
  • Figure 2: The sub-datasets from the three regions contain varying numbers of images, with an average samples count of 176 and a standard deviation of 222.3.
  • Figure 3: The distribution of multi-center datasets. The x-axis represents the pixel values, while the y-axis indicates their corresponding frequency across all images in the dataset. The curve reflects the distribution characteristics of the image pixels in the sub-dataset.
  • Figure 4: The three partitioned client datasets from imbalanced NeoJaundice contain varying numbers of training samples, with an average of 596.3 samples and a standard deviation of 201.3. After utilizing DDPM for image generation, the training samples across clients are evenly distributed.
  • Figure 5: Distribution of image pixels in different client datasets of NeoJaundice.
  • ...and 2 more figures