Table of Contents
Fetching ...

Universal Medical Imaging Model for Domain Generalization with Data Privacy

Ahmed Radwan, Islam Osman, Mohamed S. Shehata

TL;DR

Domain generalization in medical imaging is challenged by data privacy and labeling constraints. The paper presents MedUniverse, a stage-wise federated learning framework that uses Masked Image Modelling for self-supervised pretraining and aggregates client updates without accessing raw data. Evaluated on eight diverse medical-imaging tasks with significant domain shifts, the approach achieves statistically significant improvements over a strong baseline by accumulating knowledge across stages. The method enhances cross-domain robustness and trustworthiness while preserving patient confidentiality, offering a practical path for privacy-aware, generalized medical imaging models.

Abstract

Achieving domain generalization in medical imaging poses a significant challenge, primarily due to the limited availability of publicly labeled datasets in this domain. This limitation arises from concerns related to data privacy and the necessity for medical expertise to accurately label the data. In this paper, we propose a federated learning approach to transfer knowledge from multiple local models to a global model, eliminating the need for direct access to the local datasets used to train each model. The primary objective is to train a global model capable of performing a wide variety of medical imaging tasks. This is done while ensuring the confidentiality of the private datasets utilized during the training of these models. To validate the effectiveness of our approach, extensive experiments were conducted on eight datasets, each corresponding to a different medical imaging application. The client's data distribution in our experiments varies significantly as they originate from diverse domains. Despite this variation, we demonstrate a statistically significant improvement over a state-of-the-art baseline utilizing masked image modeling over a diverse pre-training dataset that spans different body parts and scanning types. This improvement is achieved by curating information learned from clients without accessing any labeled dataset on the server.

Universal Medical Imaging Model for Domain Generalization with Data Privacy

TL;DR

Domain generalization in medical imaging is challenged by data privacy and labeling constraints. The paper presents MedUniverse, a stage-wise federated learning framework that uses Masked Image Modelling for self-supervised pretraining and aggregates client updates without accessing raw data. Evaluated on eight diverse medical-imaging tasks with significant domain shifts, the approach achieves statistically significant improvements over a strong baseline by accumulating knowledge across stages. The method enhances cross-domain robustness and trustworthiness while preserving patient confidentiality, offering a practical path for privacy-aware, generalized medical imaging models.

Abstract

Achieving domain generalization in medical imaging poses a significant challenge, primarily due to the limited availability of publicly labeled datasets in this domain. This limitation arises from concerns related to data privacy and the necessity for medical expertise to accurately label the data. In this paper, we propose a federated learning approach to transfer knowledge from multiple local models to a global model, eliminating the need for direct access to the local datasets used to train each model. The primary objective is to train a global model capable of performing a wide variety of medical imaging tasks. This is done while ensuring the confidentiality of the private datasets utilized during the training of these models. To validate the effectiveness of our approach, extensive experiments were conducted on eight datasets, each corresponding to a different medical imaging application. The client's data distribution in our experiments varies significantly as they originate from diverse domains. Despite this variation, we demonstrate a statistically significant improvement over a state-of-the-art baseline utilizing masked image modeling over a diverse pre-training dataset that spans different body parts and scanning types. This improvement is achieved by curating information learned from clients without accessing any labeled dataset on the server.
Paper Structure (17 sections, 3 equations, 3 figures, 2 tables)

This paper contains 17 sections, 3 equations, 3 figures, 2 tables.

Figures (3)

  • Figure 1: The figure illustrates our Stage-wise federated learning pipeline. 1) The server initializes its base model via self-supervised MIM on data spanning different parts of the human body obtained via different scanning techniques to maintain domain diversity. 2) Clients query the server with the number of classes in their classification problem, and the server sends its base model followed by a randomly initialized linear layer. 3) The client fine-tunes the base model and sends the fine-tuned model back to the server. After receiving a pre-determined number of clients' fine-tuned models, the server concludes the stage, aggregates the weights, and updates the base model as explained in the methodology. 4) The server then sends the updated model to the clients in the next stage.
  • Figure 2: Samples of the pretraining dataset.
  • Figure 3: The plot shows the average accuracy gain across the different stages. As more clients send their fine-tuned weights back to the server, the server can aggregate more knowledge and thus present a more robust model to clients in the next stages.