Federated Learning: A new frontier in the exploration of multi-institutional medical imaging data
Dominika Ciupek, Maciej Malawski, Tomasz Pieciak
TL;DR
This review frames federated learning as a practical path to harness multi-institution medical imaging data while preserving patient privacy. It rigorously surveys FL theory, aggregation and learning algorithms, data privacy techniques, system architectures, and real-world clinical deployments, emphasizing challenges from data and model heterogeneity to malicious actors and communication constraints. The authors synthesize MI-specific aggregation/learning methods, discuss open-source frameworks, and illustrate real-world deployments and their hurdles, offering issue–method–effect mappings and guidance for future development. The work highlights the central role of tools like FedAvg, personalized and split-learning strategies, and privacy-preserving techniques in advancing clinically relevant FL systems across diverse imaging modalities. Overall, FL is positioned as a transformative framework for secure, scalable, multi-site medical imaging AI with actionable directions for standardization, validation, and deployment in clinical workflows.
Abstract
Artificial intelligence has transformed the perspective of medical imaging, leading to a genuine technological revolution in modern computer-assisted healthcare systems. However, ubiquitously featured deep learning (DL) systems require access to a considerable amount of data, facilitating proper knowledge extraction and generalization. Access to such extensive resources may be hindered due to the time and effort required to convey ethical agreements, set up and carry the acquisition procedures through, and manage the datasets adequately with a particular emphasis on proper anonymization. One of the pivotal challenges in the DL field is data integration from various sources acquired using different hardware vendors, diverse acquisition protocols, experimental setups, and even inter-operator variabilities. In this paper, we review the federated learning (FL) concept that fosters the integration of large-scale heterogeneous datasets from multiple institutions in training DL models. In contrast to a centralized approach, the decentralized FL procedure promotes training DL models while preserving data privacy at each institution involved. We formulate the FL principle and comprehensively review general and specialized medical imaging aggregation and learning algorithms, enabling the generation of a globally generalized model. We meticulously go through the challenges in constructing FL-based systems, such as data and model heterogeneities across the institutions, resilience to potential attacks on data privacy, and the variability in computational and communication resources among the entangled sites that might induce efficiency issues of the entire system. Finally, we explore the up-to-date open frameworks for rapid FL-based algorithm prototyping, comprehensively present real-world implementations of FL systems and shed light on future directions in this intensively growing field.
