Table of Contents
Fetching ...

Domain and Task-Focused Example Selection for Data-Efficient Contrastive Medical Image Segmentation

Tyler Ward, Aaron Moseley, Abdullah-Al-Zubaer Imran

TL;DR

This work tackles data-efficient medical image segmentation under limited annotations by introducing PolyCL, a two-stage self-supervised framework that uses novel organ-based, scan-based, and mixed example selection strategies to pre-train encoders for segmentation. A downstream decoder is then trained with limited labeled data, achieving competitive Dice scores and improved boundary accuracy. The authors further enhance performance with Segment Anything Model (SAM) in two ways: SAM-based mask refinement and SAM 2-driven 3D propagation from a single annotated slice, markedly boosting results in ultra-low-label regimes. Experiments on LiTS, TotalSegmentator, and MSD demonstrate strong in-domain and cross-domain generalization and practical improvements, with code released for reproducibility.

Abstract

Segmentation is one of the most important tasks in the medical imaging pipeline as it influences a number of image-based decisions. To be effective, fully supervised segmentation approaches require large amounts of manually annotated training data. However, the pixel-level annotation process is expensive, time-consuming, and error-prone, hindering progress and making it challenging to perform effective segmentations. Therefore, models must learn efficiently from limited labeled data. Self-supervised learning (SSL), particularly contrastive learning via pre-training on unlabeled data and fine-tuning on limited annotations, can facilitate such limited labeled image segmentation. To this end, we propose a novel self-supervised contrastive learning framework for medical image segmentation, leveraging inherent relationships of different images, dubbed PolyCL. Without requiring any pixel-level annotations or unreasonable data augmentations, our PolyCL learns and transfers context-aware discriminant features useful for segmentation from an innovative surrogate, in a task-related manner. Additionally, we integrate the Segment Anything Model (SAM) into our framework in two novel ways: as a post-processing refinement module that improves the accuracy of predicted masks using bounding box prompts derived from coarse outputs, and as a propagation mechanism via SAM 2 that generates volumetric segmentations from a single annotated 2D slice. Experimental evaluations on three public computed tomography (CT) datasets demonstrate that PolyCL outperforms fully-supervised and self-supervised baselines in both low-data and cross-domain scenarios. Our code is available at https://github.com/tbwa233/PolyCL.

Domain and Task-Focused Example Selection for Data-Efficient Contrastive Medical Image Segmentation

TL;DR

This work tackles data-efficient medical image segmentation under limited annotations by introducing PolyCL, a two-stage self-supervised framework that uses novel organ-based, scan-based, and mixed example selection strategies to pre-train encoders for segmentation. A downstream decoder is then trained with limited labeled data, achieving competitive Dice scores and improved boundary accuracy. The authors further enhance performance with Segment Anything Model (SAM) in two ways: SAM-based mask refinement and SAM 2-driven 3D propagation from a single annotated slice, markedly boosting results in ultra-low-label regimes. Experiments on LiTS, TotalSegmentator, and MSD demonstrate strong in-domain and cross-domain generalization and practical improvements, with code released for reproducibility.

Abstract

Segmentation is one of the most important tasks in the medical imaging pipeline as it influences a number of image-based decisions. To be effective, fully supervised segmentation approaches require large amounts of manually annotated training data. However, the pixel-level annotation process is expensive, time-consuming, and error-prone, hindering progress and making it challenging to perform effective segmentations. Therefore, models must learn efficiently from limited labeled data. Self-supervised learning (SSL), particularly contrastive learning via pre-training on unlabeled data and fine-tuning on limited annotations, can facilitate such limited labeled image segmentation. To this end, we propose a novel self-supervised contrastive learning framework for medical image segmentation, leveraging inherent relationships of different images, dubbed PolyCL. Without requiring any pixel-level annotations or unreasonable data augmentations, our PolyCL learns and transfers context-aware discriminant features useful for segmentation from an innovative surrogate, in a task-related manner. Additionally, we integrate the Segment Anything Model (SAM) into our framework in two novel ways: as a post-processing refinement module that improves the accuracy of predicted masks using bounding box prompts derived from coarse outputs, and as a propagation mechanism via SAM 2 that generates volumetric segmentations from a single annotated 2D slice. Experimental evaluations on three public computed tomography (CT) datasets demonstrate that PolyCL outperforms fully-supervised and self-supervised baselines in both low-data and cross-domain scenarios. Our code is available at https://github.com/tbwa233/PolyCL.

Paper Structure

This paper contains 23 sections, 11 equations, 7 figures, 7 tables.

Figures (7)

  • Figure 1: Example selection strategies for the proposed PolyCL framework: PolyCL-S uses the information of the CT scan to which each slice belongs, while PolyCL-O uses the information of whether each slice contains the organ-of-interest, and PolyCL-M uses both the organ-based and scan-based information.
  • Figure 2: Sample liver-absent (left) and liver-present (right) CT slices.
  • Figure 3: Overview of the proposed PolyCL training framework for medical image segmentation. The method follows a two-stage process: (1) self-supervised pre-training using contrastive learning with one of three example selection strategies (PolyCL-S, PolyCL-O, or PolyCL-M), where the encoder learns discriminative, task-relevant features from unlabeled data; and (2) supervised fine-tuning, where a decoder is added and the full model is trained on a small set of labeled images using Dice loss to predict segmentation masks.
  • Figure 4: Overview of the SAM-based mask refinement process integrated with PolyCL. Coarse segmentation masks produced by the fine-tuned PolyCL model are first converted into bounding box prompts. These prompts, along with the corresponding CT slices, are then passed into SAM, which refines the masks using a combination of sparse prompt embeddings and dense image embeddings to generate anatomically accurate segmentations.
  • Figure 5: Illustration of the SAM 2-based mask propagation framework integrated with PolyCL. A single annotated reference slice from a 3D CT volume is used to initialize SAM 2. The model then propagates segmentation predictions slice-by-slice through the volume, leveraging spatio-temporal consistency to generate accurate 3D segmentations without requiring additional annotations.
  • ...and 2 more figures