Table of Contents
Fetching ...

MedSAGa: Few-shot Memory Efficient Medical Image Segmentation using Gradient Low-Rank Projection in SAM

Navyansh Mahla, Annie D'souza, Shubh Gupta, Bhavik Kanekar, Kshitij Sharad Jadhav

TL;DR

MedSAGa tackles memory and data constraints in medical image segmentation by integrating Gradient Low-Rank Projection (GaLore) with the Segment Anything Model (SAM) to enable memory-efficient, few-shot fine-tuning of the image encoder. The prompt encoder and mask decoder are fine-tuned conventionally, preserving a lightweight training footprint while producing multiple masks that are fused into precise segmentation maps for $k$ classes. Across four diverse datasets, MedSAGa achieves substantial memory savings (about 66% more efficient on average) with segmentation performance competitive to SOTA baselines such as SAMed and DAE-Former in low-data regimes. This work demonstrates the practicality of GaLore-enabled ViT fine-tuning for medical image segmentation, enabling deployment on memory-constrained hardware without sacrificing accuracy.

Abstract

The application of large-scale models in medical image segmentation demands substantial quantities of meticulously annotated data curated by experts along with high computational resources, both of which are challenges in resource-poor settings. In this study, we present the Medical Segment Anything Model with Galore MedSAGa where we adopt the Segment Anything Model (SAM) to achieve memory-efficient, few-shot medical image segmentation by applying Gradient Low-Rank Projection GaLore to the parameters of the image encoder of SAM. Meanwhile, the weights of the prompt encoder and mask decoder undergo full parameter fine-tuning using standard optimizers. We further assess MedSAGa's few-shot learning capabilities, reporting on its memory efficiency and segmentation performance across multiple standard medical image segmentation datasets. We compare it with several baseline models, including LoRA fine-tuned SAM (SAMed) and DAE-Former. Experiments across multiple datasets and these baseline models with different number of images for fine tuning demonstrated that the GPU memory consumption of MedSAGa is significantly less than that of the baseline models, achieving an average memory efficiency of 66% more than current state-of-the-art (SOTA) models for medical image segmentation. The combination of substantially lower memory requirements and comparable to SOTA results in few-shot learning for medical image segmentation positions MedSAGa as an optimal solution for deployment in resource-constrained settings.

MedSAGa: Few-shot Memory Efficient Medical Image Segmentation using Gradient Low-Rank Projection in SAM

TL;DR

MedSAGa tackles memory and data constraints in medical image segmentation by integrating Gradient Low-Rank Projection (GaLore) with the Segment Anything Model (SAM) to enable memory-efficient, few-shot fine-tuning of the image encoder. The prompt encoder and mask decoder are fine-tuned conventionally, preserving a lightweight training footprint while producing multiple masks that are fused into precise segmentation maps for classes. Across four diverse datasets, MedSAGa achieves substantial memory savings (about 66% more efficient on average) with segmentation performance competitive to SOTA baselines such as SAMed and DAE-Former in low-data regimes. This work demonstrates the practicality of GaLore-enabled ViT fine-tuning for medical image segmentation, enabling deployment on memory-constrained hardware without sacrificing accuracy.

Abstract

The application of large-scale models in medical image segmentation demands substantial quantities of meticulously annotated data curated by experts along with high computational resources, both of which are challenges in resource-poor settings. In this study, we present the Medical Segment Anything Model with Galore MedSAGa where we adopt the Segment Anything Model (SAM) to achieve memory-efficient, few-shot medical image segmentation by applying Gradient Low-Rank Projection GaLore to the parameters of the image encoder of SAM. Meanwhile, the weights of the prompt encoder and mask decoder undergo full parameter fine-tuning using standard optimizers. We further assess MedSAGa's few-shot learning capabilities, reporting on its memory efficiency and segmentation performance across multiple standard medical image segmentation datasets. We compare it with several baseline models, including LoRA fine-tuned SAM (SAMed) and DAE-Former. Experiments across multiple datasets and these baseline models with different number of images for fine tuning demonstrated that the GPU memory consumption of MedSAGa is significantly less than that of the baseline models, achieving an average memory efficiency of 66% more than current state-of-the-art (SOTA) models for medical image segmentation. The combination of substantially lower memory requirements and comparable to SOTA results in few-shot learning for medical image segmentation positions MedSAGa as an optimal solution for deployment in resource-constrained settings.
Paper Structure (13 sections, 4 equations, 4 figures, 3 tables)

This paper contains 13 sections, 4 equations, 4 figures, 3 tables.

Figures (4)

  • Figure 1: Memory consumption of MedSAGa and the other standard baselines while fine-tuning them for medical image segmentation task.
  • Figure 2: Dice Score vs Number of images while fine-tuning on ChestX-ray8 dataset. Here, for most models, the graph plateaus out at approximately 500 images.
  • Figure 3: The architecture of MedSAGa. GaLore optimization is applied to fine-tune only the image encoder. Due to their lightweight characteristics, the Mask Decoder and the default embeddings of the Prompt Encoder are fine-tuned directly on the medical images without applying GaLore.
  • Figure 4: Comparison of loss curves between fine-tuning with and without applying warmup on ChestX-ray8 dataset.