MedForge: Building Medical Foundation Models Like Open Source Software Development
Zheling Tan, Kexin Ding, Jin Gao, Mu Zhou, Dimitris Metaxas, Shaoting Zhang, Dequan Wang
TL;DR
MedForge tackles the challenge of data silos and privacy in medical foundation model development by enabling asynchronous, community-driven model merging. The framework uses task-specific LoRA plugin modules and distilled datasets to integrate knowledge from multiple clinical centers without sharing raw data. Two merging strategies, MedForge-Fusion and MedForge-Mixture, provide flexible paths for incremental knowledge integration, with Mixture offering robustness by aggregating outputs rather than directly altering plugin parameters. Empirical results on BreakHis, LC25000, and MedFMC-Colon show that MedForge-Mixture achieves superior accuracy and AUC compared to single-task baselines and other collaborative baselines, demonstrating the approach’s potential for scalable, privacy-preserving, multi-task clinical AI development.
Abstract
Foundational models (FMs) have made significant strides in the healthcare domain. Yet the data silo challenge and privacy concern remain in healthcare systems, hindering safe medical data sharing and collaborative model development among institutions. The collection and curation of scalable clinical datasets increasingly become the bottleneck for training strong FMs. In this study, we propose Medical Foundation Models Merging (MedForge), a cooperative framework enabling a community-driven medical foundation model development, meanwhile preventing the information leakage of raw patient data and mitigating synchronization model development issues across clinical institutions. MedForge offers a bottom-up model construction mechanism by flexibly merging task-specific Low-Rank Adaptation (LoRA) modules, which can adapt to downstream tasks while retaining original model parameters. Through an asynchronous LoRA module integration scheme, the resulting composite model can progressively enhance its comprehensive performance on various clinical tasks. MedForge shows strong performance on multiple clinical datasets (e.g., breast cancer, lung cancer, and colon cancer) collected from different institutions. Our major findings highlight the value of collaborative foundation models in advancing multi-center clinical collaboration effectively and cohesively. Our code is publicly available at https://github.com/TanZheling/MedForge.
