Table of Contents
Fetching ...

MAFM^3: Modular Adaptation of Foundation Models for Multi-Modal Medical AI

Mohammad Areeb Qazi, Munachiso S Nwadike, Ibrahim Almakky, Mohammad Yaqub, Numan Saeed

TL;DR

MAFM^3 addresses data scarcity and modality variability in medical imaging by proposing a modular framework that extends a frozen foundation model with lightweight, selectively activatable components for classification, prognosis, and segmentation across CT, PET, and reports. The method combines within-model LoRA adapters, post-model decoders, and resolution-aware embeddings to enable cumulative, forgetting-free growth across tasks and modalities. Empirical results on the HECKTOR dataset show consistent gains in prognosis (C-index up to 0.721) and segmentation (Dice up to 65.7%) with modest parameter overhead, and improved robustness to domain shifts. The work suggests a practical path toward scalable, generalist medical AI that can evolve with clinical needs while minimizing retraining.

Abstract

Foundational models are trained on extensive datasets to capture the general trends of a domain. However, in medical imaging, the scarcity of data makes pre-training for every domain, modality, or task challenging. Instead of building separate models, we propose MAFM^3 (Modular Adaptation of Foundation Models for Multi-Modal Medical AI), a framework that enables a single foundation model to expand into diverse domains, tasks, and modalities through lightweight modular components. These components serve as specialized skill sets that allow the system to flexibly activate the appropriate capability at the inference time, depending on the input type or clinical objective. Unlike conventional adaptation methods that treat each new task or modality in isolation, MAFM^3 provides a unified and expandable framework for efficient multitask and multimodality adaptation. Empirically, we validate our approach by adapting a chest CT foundation model initially trained for classification into prognosis and segmentation modules. Our results show improved performance on both tasks. Furthermore, by incorporating PET scans, MAFM^3 achieved an improvement in the Dice score 5% compared to the respective baselines. These findings establish that foundation models, when equipped with modular components, are not inherently constrained to their initial training scope but can evolve into multitask, multimodality systems for medical imaging. The code implementation of this work can be found at https://github.com/Areeb2735/CTscan_prognosis_VLM

MAFM^3: Modular Adaptation of Foundation Models for Multi-Modal Medical AI

TL;DR

MAFM^3 addresses data scarcity and modality variability in medical imaging by proposing a modular framework that extends a frozen foundation model with lightweight, selectively activatable components for classification, prognosis, and segmentation across CT, PET, and reports. The method combines within-model LoRA adapters, post-model decoders, and resolution-aware embeddings to enable cumulative, forgetting-free growth across tasks and modalities. Empirical results on the HECKTOR dataset show consistent gains in prognosis (C-index up to 0.721) and segmentation (Dice up to 65.7%) with modest parameter overhead, and improved robustness to domain shifts. The work suggests a practical path toward scalable, generalist medical AI that can evolve with clinical needs while minimizing retraining.

Abstract

Foundational models are trained on extensive datasets to capture the general trends of a domain. However, in medical imaging, the scarcity of data makes pre-training for every domain, modality, or task challenging. Instead of building separate models, we propose MAFM^3 (Modular Adaptation of Foundation Models for Multi-Modal Medical AI), a framework that enables a single foundation model to expand into diverse domains, tasks, and modalities through lightweight modular components. These components serve as specialized skill sets that allow the system to flexibly activate the appropriate capability at the inference time, depending on the input type or clinical objective. Unlike conventional adaptation methods that treat each new task or modality in isolation, MAFM^3 provides a unified and expandable framework for efficient multitask and multimodality adaptation. Empirically, we validate our approach by adapting a chest CT foundation model initially trained for classification into prognosis and segmentation modules. Our results show improved performance on both tasks. Furthermore, by incorporating PET scans, MAFM^3 achieved an improvement in the Dice score 5% compared to the respective baselines. These findings establish that foundation models, when equipped with modular components, are not inherently constrained to their initial training scope but can evolve into multitask, multimodality systems for medical imaging. The code implementation of this work can be found at https://github.com/Areeb2735/CTscan_prognosis_VLM

Paper Structure

This paper contains 30 sections, 12 equations, 2 figures, 3 tables.

Figures (2)

  • Figure 1: Overview of the objective behind MAFM$^{3}$. A clinician provides diverse medical data sources such as CT scans, PET scans, and electronic health records (EHR). The Medical Foundational Model serves as a central knowledge base, from which task-specific experts (classification, prognosis, segmentation) can be activated depending on the input and clinical requirement. This modular design enables flexible adaptation of a single foundational model to multiple modalities and tasks, supporting generalist medical AI applications.
  • Figure 2: Conceptual overview of MAFM$^{3}$. A single foundational model trained on Domain-1 CT+Reports for classification is adapted with lightweight modules (LoRA, adapters, decoder) for new domains, inputs, and tasks. Each panel illustrates an independent adaptation scenario, highlighting the modular extensibility of the framework.