Table of Contents
Fetching ...

Efficient Parameter Adaptation for Multi-Modal Medical Image Segmentation and Prognosis

Numan Saeed, Shahad Hardan, Muhammad Ridzuan, Nada Saadi, Karthik Nandakumar, Mohammad Yaqub

TL;DR

The paper addresses the challenge of leveraging widely available CT data to enable CT+PET and EHR integration for tumor segmentation and prognosis. It introduces PEMMA, a parameter-efficient adaptation framework based on LoRA/DoRA that adds PET context via patch embeddings and PET skip pathways while freezing the base CT model. PEMMA achieves competitive performance to early fusion with only about 8% of trainable parameters, and yields substantial gains in PET segmentation accuracy and prognosis when modalities are added or extended with EHR. The approach also supports continual learning for new centers and data distributions, offering a practical path toward flexible, resource-efficient multi-modal clinical AI.

Abstract

Cancer detection and prognosis relies heavily on medical imaging, particularly CT and PET scans. Deep Neural Networks (DNNs) have shown promise in tumor segmentation by fusing information from these modalities. However, a critical bottleneck exists: the dependency on CT-PET data concurrently for training and inference, posing a challenge due to the limited availability of PET scans. Hence, there is a clear need for a flexible and efficient framework that can be trained with the widely available CT scans and can be still adapted for PET scans when they become available. In this work, we propose a parameter-efficient multi-modal adaptation (PEMMA) framework for lightweight upgrading of a transformer-based segmentation model trained only on CT scans such that it can be efficiently adapted for use with PET scans when they become available. This framework is further extended to perform prognosis task maintaining the same efficient cross-modal fine-tuning approach. The proposed approach is tested with two well-known segementation backbones, namely UNETR and Swin UNETR. Our approach offers two main advantages. Firstly, we leverage the inherent modularity of the transformer architecture and perform low-rank adaptation (LoRA) as well as decomposed low-rank adaptation (DoRA) of the attention weights to achieve parameter-efficient adaptation. Secondly, by minimizing cross-modal entanglement, PEMMA allows updates using only one modality without causing catastrophic forgetting in the other. Our method achieves comparable performance to early fusion, but with only 8% of the trainable parameters, and demonstrates a significant +28% Dice score improvement on PET scans when trained with a single modality. Furthermore, in prognosis, our method improves the concordance index by +10% when adapting a CT-pretrained model to include PET scans, and by +23% when adapting for both PET and EHR data.

Efficient Parameter Adaptation for Multi-Modal Medical Image Segmentation and Prognosis

TL;DR

The paper addresses the challenge of leveraging widely available CT data to enable CT+PET and EHR integration for tumor segmentation and prognosis. It introduces PEMMA, a parameter-efficient adaptation framework based on LoRA/DoRA that adds PET context via patch embeddings and PET skip pathways while freezing the base CT model. PEMMA achieves competitive performance to early fusion with only about 8% of trainable parameters, and yields substantial gains in PET segmentation accuracy and prognosis when modalities are added or extended with EHR. The approach also supports continual learning for new centers and data distributions, offering a practical path toward flexible, resource-efficient multi-modal clinical AI.

Abstract

Cancer detection and prognosis relies heavily on medical imaging, particularly CT and PET scans. Deep Neural Networks (DNNs) have shown promise in tumor segmentation by fusing information from these modalities. However, a critical bottleneck exists: the dependency on CT-PET data concurrently for training and inference, posing a challenge due to the limited availability of PET scans. Hence, there is a clear need for a flexible and efficient framework that can be trained with the widely available CT scans and can be still adapted for PET scans when they become available. In this work, we propose a parameter-efficient multi-modal adaptation (PEMMA) framework for lightweight upgrading of a transformer-based segmentation model trained only on CT scans such that it can be efficiently adapted for use with PET scans when they become available. This framework is further extended to perform prognosis task maintaining the same efficient cross-modal fine-tuning approach. The proposed approach is tested with two well-known segementation backbones, namely UNETR and Swin UNETR. Our approach offers two main advantages. Firstly, we leverage the inherent modularity of the transformer architecture and perform low-rank adaptation (LoRA) as well as decomposed low-rank adaptation (DoRA) of the attention weights to achieve parameter-efficient adaptation. Secondly, by minimizing cross-modal entanglement, PEMMA allows updates using only one modality without causing catastrophic forgetting in the other. Our method achieves comparable performance to early fusion, but with only 8% of the trainable parameters, and demonstrates a significant +28% Dice score improvement on PET scans when trained with a single modality. Furthermore, in prognosis, our method improves the concordance index by +10% when adapting a CT-pretrained model to include PET scans, and by +23% when adapting for both PET and EHR data.

Paper Structure

This paper contains 28 sections, 3 equations, 6 figures, 5 tables.

Figures (6)

  • Figure 1: Overview of our proposed architecture PEMMA: At the input level, we separate the path for CT and PET by adding the PET Skip Connection $\theta_{\textrm{SK}}^{P}$. We freeze both the encoder and decoder part of the base segmentation model and introduce a PEFT module (LoRA or DoRA), after each ViT block (x12) as the only trainable layers. Additionally, our flexible architecture allows continual learning through adopting this model to other tasks, such as prognosis.
  • Figure 2: Adapter Module:At the input level, we pass the CT and PET images to a patch embedding layer. The adapter module includes an adapter layer and a projection layer. When both modalities exist (left), the adapter is inactive and the patch tokens pass to the projection layer. When either modality is missing (middle/right), the adapter layer gets activated and the patch tokens pass through the adapter layer followed by the projection layer. The adapter layer reshapes the single-modality input to the expected multi-modality input required by the model's architecture.
  • Figure 3: Qualitative results of multi-modal adaptation stage: We review the detection/segmentation results of both gross tumor and lymph nodes. Especially, PEMMA generalizes well in organ segmentation and does not generate many false positives of tumors. However, it can be observed that overall DoRA outperforms LoRA-based PEFT.
  • Figure 4: Qualitative results of CL Task 1: We review the detection/segmentation results after finetuning using single modality i.e. CT and multi-modality i.e. CT+PET data from HGJ center.
  • Figure 5: Qualitative results of CL Task 2: We review the detection/segmentation results after finetuning using single modality i.e. CT and multi-modality i.e. CT+PET data from HMR center.
  • ...and 1 more figures