Table of Contents
Fetching ...

GBT-SAM: A Parameter-Efficient Depth-Aware Model for Generalizable Brain tumour Segmentation on mp-MRI

Cecilia Diana-Albelda, Roberto Alcover-Couso, Álvaro García-Martín, Jesus Bescos, Marcos Escudero-Viñolo

TL;DR

This work tackles automatic brain tumor segmentation on multi-parametric MRI by adapting the Segment Anything Model (SAM) to volumetric data in a parameter-efficient way. The proposed GBT-SAM uses a four-channel patch embedding to fuse T1, T2, T1c, and T2-FLAIR, a two-stage training regime, LoRA-based PEFT, and a Depth-Condition module to capture inter-slice correlations, achieving 6.5M trainable parameters. It delivers a Dice score of 93.54 on BraTS Adult Glioma and demonstrates strong cross-domain generalization to Meningioma, Pediatric Glioma, and Sub-Saharan Glioma, highlighting practical efficiency and robustness. The approach offers a promising, scalable solution for clinical workflows with reduced computational cost while maintaining high segmentation accuracy.

Abstract

Gliomas are aggressive brain tumors that require accurate imaging-based diagnosis, with segmentation playing a critical role in evaluating morphology and treatment decisions. Manual delineation of gliomas is time-consuming and prone to variability, motivating the use of deep learning to improve consistency and alleviate clinical workload. However, existing methods often fail to fully exploit the information available in multi-parametric MRI (mp-MRI), particularly inter-slice contextual features, and typically require considerable computational resources while lacking robustness across tumor type variations. We present GBT-SAM, a parameter-efficient deep learning framework that adapts the Segment Anything Model (SAM), a large-scale vision model, to volumetric mp-MRI data. GBT-SAM reduces input complexity by selecting fewer than 2.6\% of slices per scan while incorporating all four MRI modalities, preserving essential tumor-related information with minimal cost. Furthermore, our model is trained by a two-step fine-tuning strategy that incorporates a depth-aware module to capture inter-slice correlations and lightweight adaptation layers, resulting in just 6.5M trainable parameters, which is the lowest among SAM-based approaches. GBT-SAM achieves a Dice Score of 93.54 on the BraTS Adult Glioma dataset and demonstrates robust performance on Meningioma, Pediatric Glioma, and Sub-Saharan Glioma datasets. These results highlight GBT-SAM's potential as a computationally efficient and domain-robust framework for brain tumor segmentation using mp-MRI. Our code and models are available at https://github.com/vpulab/med-sam-brain .

GBT-SAM: A Parameter-Efficient Depth-Aware Model for Generalizable Brain tumour Segmentation on mp-MRI

TL;DR

This work tackles automatic brain tumor segmentation on multi-parametric MRI by adapting the Segment Anything Model (SAM) to volumetric data in a parameter-efficient way. The proposed GBT-SAM uses a four-channel patch embedding to fuse T1, T2, T1c, and T2-FLAIR, a two-stage training regime, LoRA-based PEFT, and a Depth-Condition module to capture inter-slice correlations, achieving 6.5M trainable parameters. It delivers a Dice score of 93.54 on BraTS Adult Glioma and demonstrates strong cross-domain generalization to Meningioma, Pediatric Glioma, and Sub-Saharan Glioma, highlighting practical efficiency and robustness. The approach offers a promising, scalable solution for clinical workflows with reduced computational cost while maintaining high segmentation accuracy.

Abstract

Gliomas are aggressive brain tumors that require accurate imaging-based diagnosis, with segmentation playing a critical role in evaluating morphology and treatment decisions. Manual delineation of gliomas is time-consuming and prone to variability, motivating the use of deep learning to improve consistency and alleviate clinical workload. However, existing methods often fail to fully exploit the information available in multi-parametric MRI (mp-MRI), particularly inter-slice contextual features, and typically require considerable computational resources while lacking robustness across tumor type variations. We present GBT-SAM, a parameter-efficient deep learning framework that adapts the Segment Anything Model (SAM), a large-scale vision model, to volumetric mp-MRI data. GBT-SAM reduces input complexity by selecting fewer than 2.6\% of slices per scan while incorporating all four MRI modalities, preserving essential tumor-related information with minimal cost. Furthermore, our model is trained by a two-step fine-tuning strategy that incorporates a depth-aware module to capture inter-slice correlations and lightweight adaptation layers, resulting in just 6.5M trainable parameters, which is the lowest among SAM-based approaches. GBT-SAM achieves a Dice Score of 93.54 on the BraTS Adult Glioma dataset and demonstrates robust performance on Meningioma, Pediatric Glioma, and Sub-Saharan Glioma datasets. These results highlight GBT-SAM's potential as a computationally efficient and domain-robust framework for brain tumor segmentation using mp-MRI. Our code and models are available at https://github.com/vpulab/med-sam-brain .

Paper Structure

This paper contains 22 sections, 10 equations, 6 figures, 7 tables.

Figures (6)

  • Figure 1: MRI modalities. Examples of tumor visualization in which each column represents a different MRI modality. In these cases, the full tumor extension is not visible in any single modality, highlighting the necessity of leveraging all of them to accurately predict and segment the tumor region. The top row shows the raw MRI images, while the bottom one includes the ground-truth tumor region.
  • Figure 2: Comparison of brain tumor segmentation methods based on performance (Dice Score), publication year, and model size (number of trainable parameters in millions). Pink, green and blue bubbles represent UNet-based, Generative, and SAM-based methods respectively. Our method, marked with a star, almost reaches the highest obtained Dice Score while training the smallest number of parameters, highlighting its efficiency and effectiveness compared to state-of-the-art approaches.
  • Figure 3: GBT-SAM pipeline. In a first training step, we perform slice selection to reduce computational costs while enhancing generalization capability. Moreover, the patch embedding layer is trained, while the rest of the modules remain frozen: image encoder, responsible for extracting features from the input slices; positional encoder, combining features with the bounding box information; and mask decoder, producing the predicted segmentation. In a second training step, the patch embedding layer is further trained alongside additional trainable components introduced in a modified version of the image encoder (depth-aware medical encoder): LoRA blocks and a Depth-Condition module.
  • Figure 4: Ground-truth incongruity due to slice correlation. Examples of MRI slices where some pixels are annotated as tumor regions (highlighted in pink) despite the tumor or brain structure is not visible in the pixel. This is due to the fact that doctors may analyse contiguous slices simultaneously, leveraging the volumetric context, hence highlighting the importance of enriching image features by incorporating inter-slice correlations.
  • Figure 5: Depth-Condition Block. The Depth-Condition block consists of three main stages: unfolding the volumetric data along the slice dimension with layer normalization, processing it to extract depth-specific features, and folding the data back into the volumetric structure. This Depth-Conditioned output is then integrated into the ViT block, where it complements the features processed by the LoRA modules, enabling the model to leverage both spatial and volumetric information.
  • ...and 1 more figures