Table of Contents
Fetching ...

A cautionary tale on the cost-effectiveness of collaborative AI in real-world medical applications

Francesco Cremonesi, Lucia Innocenti, Sebastien Ourselin, Vicky Goh, Michela Antonelli, Marco Lorenzi

TL;DR

This study addresses the practical evaluation of collaborative AI in healthcare by comparing six FL methods with five CBL approaches across seven diverse medical datasets and tasks. It demonstrates that consensus-based learning can match Federated Learning in accuracy while delivering substantial cost reductions in training time (about 15x) and network bandwidth (about 60x), enabling more sustainable and accessible collaborations. Although neither paradigm consistently outperforms the other, CBL's asynchronous, modular nature reduces deployment complexity and hardware demands, suggesting a practical pathway for real-world adoption. The work also highlights privacy considerations and calls for quantitative system-level metrics, including energy and CO2 implications, to guide future deployments of collaborative AI in medicine.

Abstract

Background. Federated learning (FL) has gained wide popularity as a collaborative learning paradigm enabling collaborative AI in sensitive healthcare applications. Nevertheless, the practical implementation of FL presents technical and organizational challenges, as it generally requires complex communication infrastructures. In this context, consensus-based learning (CBL) may represent a promising collaborative learning alternative, thanks to the ability of combining local knowledge into a federated decision system, while potentially reducing deployment overhead. Methods. In this work we propose an extensive benchmark of the accuracy and cost-effectiveness of a panel of FL and CBL methods in a wide range of collaborative medical data analysis scenarios. The benchmark includes 7 different medical datasets, encompassing 3 machine learning tasks, 8 different data modalities, and multi-centric settings involving 3 to 23 clients. Findings. Our results reveal that CBL is a cost-effective alternative to FL. When compared across the panel of medical dataset in the considered benchmark, CBL methods provide equivalent accuracy to the one achieved by FL.Nonetheless, CBL significantly reduces training time and communication cost (resp. 15 fold and 60 fold decrease) (p < 0.05). Interpretation. This study opens a novel perspective on the deployment of collaborative AI in real-world applications, whereas the adoption of cost-effective methods is instrumental to achieve sustainability and democratisation of AI by alleviating the need for extensive computational resources.

A cautionary tale on the cost-effectiveness of collaborative AI in real-world medical applications

TL;DR

This study addresses the practical evaluation of collaborative AI in healthcare by comparing six FL methods with five CBL approaches across seven diverse medical datasets and tasks. It demonstrates that consensus-based learning can match Federated Learning in accuracy while delivering substantial cost reductions in training time (about 15x) and network bandwidth (about 60x), enabling more sustainable and accessible collaborations. Although neither paradigm consistently outperforms the other, CBL's asynchronous, modular nature reduces deployment complexity and hardware demands, suggesting a practical pathway for real-world adoption. The work also highlights privacy considerations and calls for quantitative system-level metrics, including energy and CO2 implications, to guide future deployments of collaborative AI in medicine.

Abstract

Background. Federated learning (FL) has gained wide popularity as a collaborative learning paradigm enabling collaborative AI in sensitive healthcare applications. Nevertheless, the practical implementation of FL presents technical and organizational challenges, as it generally requires complex communication infrastructures. In this context, consensus-based learning (CBL) may represent a promising collaborative learning alternative, thanks to the ability of combining local knowledge into a federated decision system, while potentially reducing deployment overhead. Methods. In this work we propose an extensive benchmark of the accuracy and cost-effectiveness of a panel of FL and CBL methods in a wide range of collaborative medical data analysis scenarios. The benchmark includes 7 different medical datasets, encompassing 3 machine learning tasks, 8 different data modalities, and multi-centric settings involving 3 to 23 clients. Findings. Our results reveal that CBL is a cost-effective alternative to FL. When compared across the panel of medical dataset in the considered benchmark, CBL methods provide equivalent accuracy to the one achieved by FL.Nonetheless, CBL significantly reduces training time and communication cost (resp. 15 fold and 60 fold decrease) (p < 0.05). Interpretation. This study opens a novel perspective on the deployment of collaborative AI in real-world applications, whereas the adoption of cost-effective methods is instrumental to achieve sustainability and democratisation of AI by alleviating the need for extensive computational resources.

Paper Structure

This paper contains 26 sections, 1 equation, 7 figures, 14 tables.

Figures (7)

  • Figure 1: Training and inference phases for federated learning (FL, on the left) and consensus-based learning (CBL, on the right). In FL training is performed collaboratively to produce a common global model across clients. The global model is subsequently used for inference on new data instances. CBL instead requires clients to train a model on the respective local data independently. Inference on new data instances is performed collaboratively through consensus.
  • Figure 2: Results obtained by centralized learning (green), local learning (blue), federated learning (orange), and consensus-based learning (brown) methods. The boxplot represents the accuracy among test sets for centralized learning, local models, and CL methods. For FeTS, local accuracy results are aggregated due to the large number of clients.
  • Figure 3: Examples of images from the FedProstate dataset showing the heterogeneity among different clients.
  • Figure 4: Distributions of dataset size among all the clients in the FeTS dataset
  • Figure 5: Examples of different modalities for one patient in the FeTS dataset. MASK represents the segmentation mask, which is used as ground truth for our segmentation problem.
  • ...and 2 more figures