Table of Contents
Fetching ...

Annotation-Efficient Task Guidance for Medical Segment Anything

Tyler Ward, Abdullah-Al-Zubaer Imran

TL;DR

This paper tackles the high cost of labeled data in medical image segmentation by introducing SAM-Mix, a multitask framework that ties GradCAM-based auxiliary classification to a SAM segmentation pipeline. The method automatically generates ROI prompts from GradCAM activations and feeds them into a LoRA-adapted SAM, enabling efficient, end-to-end training with limited labels. On LiTS liver segmentation, SAM-Mix with as few as 50 labeled slices achieves substantial Dice improvements over baselines and demonstrates strong cross-domain generalization to TotalSegmentator, highlighting both annotation efficiency and robustness. The approach offers a practical path toward high-quality medical segmentation with reduced annotation burden, and the authors provide code for reproducibility.

Abstract

Medical image segmentation is a key task in the imaging workflow, influencing many image-based decisions. Traditional, fully-supervised segmentation models rely on large amounts of labeled training data, typically obtained through manual annotation, which can be an expensive, time-consuming, and error-prone process. This signals a need for accurate, automatic, and annotation-efficient methods of training these models. We propose SAM-Mix, a novel multitask learning framework for medical image segmentation that uses class activation maps produced by an auxiliary classifier to guide the predictions of the semi-supervised segmentation branch, which is based on the SAM framework. Experimental evaluations on the public LiTS dataset confirm the effectiveness of SAM-Mix for simultaneous classification and segmentation of the liver from abdominal computed tomography (CT) scans. When trained for 90% fewer epochs on only 50 labeled 2D slices, representing just 0.04% of the available labeled training data, SAM-Mix achieves a Dice improvement of 5.1% over the best baseline model. The generalization results for SAM-Mix are even more impressive, with the same model configuration yielding a 25.4% Dice improvement on a cross-domain segmentation task. Our code is available at https://github.com/tbwa233/SAM-Mix.

Annotation-Efficient Task Guidance for Medical Segment Anything

TL;DR

This paper tackles the high cost of labeled data in medical image segmentation by introducing SAM-Mix, a multitask framework that ties GradCAM-based auxiliary classification to a SAM segmentation pipeline. The method automatically generates ROI prompts from GradCAM activations and feeds them into a LoRA-adapted SAM, enabling efficient, end-to-end training with limited labels. On LiTS liver segmentation, SAM-Mix with as few as 50 labeled slices achieves substantial Dice improvements over baselines and demonstrates strong cross-domain generalization to TotalSegmentator, highlighting both annotation efficiency and robustness. The approach offers a practical path toward high-quality medical segmentation with reduced annotation burden, and the authors provide code for reproducibility.

Abstract

Medical image segmentation is a key task in the imaging workflow, influencing many image-based decisions. Traditional, fully-supervised segmentation models rely on large amounts of labeled training data, typically obtained through manual annotation, which can be an expensive, time-consuming, and error-prone process. This signals a need for accurate, automatic, and annotation-efficient methods of training these models. We propose SAM-Mix, a novel multitask learning framework for medical image segmentation that uses class activation maps produced by an auxiliary classifier to guide the predictions of the semi-supervised segmentation branch, which is based on the SAM framework. Experimental evaluations on the public LiTS dataset confirm the effectiveness of SAM-Mix for simultaneous classification and segmentation of the liver from abdominal computed tomography (CT) scans. When trained for 90% fewer epochs on only 50 labeled 2D slices, representing just 0.04% of the available labeled training data, SAM-Mix achieves a Dice improvement of 5.1% over the best baseline model. The generalization results for SAM-Mix are even more impressive, with the same model configuration yielding a 25.4% Dice improvement on a cross-domain segmentation task. Our code is available at https://github.com/tbwa233/SAM-Mix.

Paper Structure

This paper contains 7 sections, 9 equations, 4 figures, 2 tables.

Figures (4)

  • Figure 1: Illustration of the proposed SAM-Mix multitask learning framework. SAM-Mix combines a ResNet-38 classifier for GradCAM generation, automated prompt generation through mask and bounding box extraction, and a SAM-based segmentation pipeline. Low-rank adaptation (LoRA) is used with a rank ($r$) of 8 to make SAM more parameter-efficient. SAM-Mix uses classification-guided attention to produce accurate segmentation masks ($m_1$-$m_3$) with corresponding confidence scores ($s_1$-$s_3$).
  • Figure 2: Dice score Box plots across the in-domain and cross-domain demonstrate the superior generalizability of SAM-Mix even at scarce labeled settings.
  • Figure 3: Qualitative comparison demonstrates the superiority of SAM-Mix over other models in segmenting CT liver even when it's trained only on 5 segmentation labels. Color code: Green - Ground Truth mask, Red-predicted contour.
  • Figure 4: Qualitative comparison demonstrates the superiority of SAM-Mix over other models in generalizing to cross-domain segmentation tasks.