Table of Contents
Fetching ...

Federated brain tumor segmentation: an extensive benchmark

Matthis Manthe, Stefan Duffner, Carole Lartizien

TL;DR

An extensive benchmark of federated learning algorithms from all three classes on this task is proposed and it is shown that some methods from each category can bring a slight performance improvement and potentially limit the final model(s) bias toward the predominant data distribution of the federation.

Abstract

Recently, federated learning has raised increasing interest in the medical image analysis field due to its ability to aggregate multi-center data with privacy-preserving properties. A large amount of federated training schemes have been published, which we categorize into global (one final model), personalized (one model per institution) or hybrid (one model per cluster of institutions) methods. However, their applicability on the recently published Federated Brain Tumor Segmentation 2022 dataset has not been explored yet. We propose an extensive benchmark of federated learning algorithms from all three classes on this task. While standard FedAvg already performs very well, we show that some methods from each category can bring a slight performance improvement and potentially limit the final model(s) bias toward the predominant data distribution of the federation. Moreover, we provide a deeper understanding of the behaviour of federated learning on this task through alternative ways of distributing the pooled dataset among institutions, namely an Independent and Identical Distributed (IID) setup, and a limited data setup.

Federated brain tumor segmentation: an extensive benchmark

TL;DR

An extensive benchmark of federated learning algorithms from all three classes on this task is proposed and it is shown that some methods from each category can bring a slight performance improvement and potentially limit the final model(s) bias toward the predominant data distribution of the federation.

Abstract

Recently, federated learning has raised increasing interest in the medical image analysis field due to its ability to aggregate multi-center data with privacy-preserving properties. A large amount of federated training schemes have been published, which we categorize into global (one final model), personalized (one model per institution) or hybrid (one model per cluster of institutions) methods. However, their applicability on the recently published Federated Brain Tumor Segmentation 2022 dataset has not been explored yet. We propose an extensive benchmark of federated learning algorithms from all three classes on this task. While standard FedAvg already performs very well, we show that some methods from each category can bring a slight performance improvement and potentially limit the final model(s) bias toward the predominant data distribution of the federation. Moreover, we provide a deeper understanding of the behaviour of federated learning on this task through alternative ways of distributing the pooled dataset among institutions, namely an Independent and Identical Distributed (IID) setup, and a limited data setup.

Paper Structure

This paper contains 60 sections, 34 equations, 10 figures, 4 tables, 1 algorithm.

Figures (10)

  • Figure 1: Number of samples and tumor grade distribution per institution in the challenge setup of FeTS2022 dataset.
  • Figure 2: Label's volume distribution per institution in FeTS2022 dataset.
  • Figure 3: Number of samples and tumor grade distribution per institution in the limited setup of FeTS2022 dataset. The complete federated subset is composed of 278 patients.
  • Figure 4: Average Dice scores for local, centralized and FedAvg variations training in the challenge setup of FeTS2022 dataset per institution, in decreasing order of number of samples (c.f. Figure \ref{['fig:tumor_type_skew_fets2022']}). Institutions 17 to 9 (on the right) each own from 10 to 4 samples in total. Errors bars represent $\pm$ one standard deviation. Centralized training (blue) is the upper baseline. Drops in performance are perceived for every variants of FedAvg compared to Centralized, with significantly larger ones for institutions 12, 13, 14, 15 and 3. FedAvg with fixed epochs (orange) converges faster than other variants to a close match with Centralized performances at the cost of larger gaps for previously cited institutions.
  • Figure 5: Average Dice scores of centralized and state-of-the-art global federated training in the challenge setup of FeTS2022 dataset per institution, in decreasing order of number of samples (c.f. Figure \ref{['fig:tumor_type_skew_fets2022']}). Institutions 17 to 9 each own from 10 to 4 total samples. Errors bars represent $\pm$ one standard deviation. Only SCAFFOLD (red) improves upon the federated baseline FedAvg (orange) by mitigating the drop in performance for institutions 12, 13, 14, 15 and 3 while maintaining it on others. q-FedAvg (pink) and FedNova (green) homogenizes DICE scores across institutions but decrease the overall performance.
  • ...and 5 more figures