Domain and Task-Focused Example Selection for Data-Efficient Contrastive Medical Image Segmentation
Tyler Ward, Aaron Moseley, Abdullah-Al-Zubaer Imran
TL;DR
This work tackles data-efficient medical image segmentation under limited annotations by introducing PolyCL, a two-stage self-supervised framework that uses novel organ-based, scan-based, and mixed example selection strategies to pre-train encoders for segmentation. A downstream decoder is then trained with limited labeled data, achieving competitive Dice scores and improved boundary accuracy. The authors further enhance performance with Segment Anything Model (SAM) in two ways: SAM-based mask refinement and SAM 2-driven 3D propagation from a single annotated slice, markedly boosting results in ultra-low-label regimes. Experiments on LiTS, TotalSegmentator, and MSD demonstrate strong in-domain and cross-domain generalization and practical improvements, with code released for reproducibility.
Abstract
Segmentation is one of the most important tasks in the medical imaging pipeline as it influences a number of image-based decisions. To be effective, fully supervised segmentation approaches require large amounts of manually annotated training data. However, the pixel-level annotation process is expensive, time-consuming, and error-prone, hindering progress and making it challenging to perform effective segmentations. Therefore, models must learn efficiently from limited labeled data. Self-supervised learning (SSL), particularly contrastive learning via pre-training on unlabeled data and fine-tuning on limited annotations, can facilitate such limited labeled image segmentation. To this end, we propose a novel self-supervised contrastive learning framework for medical image segmentation, leveraging inherent relationships of different images, dubbed PolyCL. Without requiring any pixel-level annotations or unreasonable data augmentations, our PolyCL learns and transfers context-aware discriminant features useful for segmentation from an innovative surrogate, in a task-related manner. Additionally, we integrate the Segment Anything Model (SAM) into our framework in two novel ways: as a post-processing refinement module that improves the accuracy of predicted masks using bounding box prompts derived from coarse outputs, and as a propagation mechanism via SAM 2 that generates volumetric segmentations from a single annotated 2D slice. Experimental evaluations on three public computed tomography (CT) datasets demonstrate that PolyCL outperforms fully-supervised and self-supervised baselines in both low-data and cross-domain scenarios. Our code is available at https://github.com/tbwa233/PolyCL.
