Table of Contents
Fetching ...

An Attentive Representative Sample Selection Strategy Combined with Balanced Batch Training for Skin Lesion Segmentation

Stephen Lloyd-Brown, Susan Francis, Caroline Hoad, Penny Gowland, Karen Mullinger, Andrew French, Xin Chen

TL;DR

Medical image segmentation suffers from annotation bottlenecks; this paper introduces a one-shot active learning pipeline that combines prototypical contrastive learning with clustering to select representative samples, together with an unsupervised balanced batch loading strategy to improve learning from a small labeled subset. By automatically determining the cluster count with a bespoke $K$-optimization and sampling $D$ images per cluster (where $D = N/K$), the method maintains diversity while minimizing labeling effort; the nnU-Net segmentation model is trained under balanced batches, yielding superior performance on the ISIC-2018 skin lesion dataset compared to a state-of-the-art contrastive annotation method and random sampling, with ablations confirming the benefits of balanced batching and cluster-aware features. The approach promises improved sample efficiency for medical image segmentation and can be extended to other modalities such as MRI, with potential integration of semi-supervised learning and augmentation for clinical deployment.

Abstract

An often overlooked problem in medical image segmentation research is the effective selection of training subsets to annotate from a complete set of unlabelled data. Many studies select their training sets at random, which may lead to suboptimal model performance, especially in the minimal supervision setting where each training image has a profound effect on performance outcomes. This work aims to address this issue. We use prototypical contrasting learning and clustering to extract representative and diverse samples for annotation. We improve upon prior works with a bespoke cluster-based image selection process. Additionally, we introduce the concept of unsupervised balanced batch dataloading to medical image segmentation, which aims to improve model learning with minimally annotated data. We evaluated our method on a public skin lesion dataset (ISIC 2018) and compared it to another state-of-the-art data sampling method. Our method achieved superior performance in a low annotation budget scenario.

An Attentive Representative Sample Selection Strategy Combined with Balanced Batch Training for Skin Lesion Segmentation

TL;DR

Medical image segmentation suffers from annotation bottlenecks; this paper introduces a one-shot active learning pipeline that combines prototypical contrastive learning with clustering to select representative samples, together with an unsupervised balanced batch loading strategy to improve learning from a small labeled subset. By automatically determining the cluster count with a bespoke -optimization and sampling images per cluster (where ), the method maintains diversity while minimizing labeling effort; the nnU-Net segmentation model is trained under balanced batches, yielding superior performance on the ISIC-2018 skin lesion dataset compared to a state-of-the-art contrastive annotation method and random sampling, with ablations confirming the benefits of balanced batching and cluster-aware features. The approach promises improved sample efficiency for medical image segmentation and can be extended to other modalities such as MRI, with potential integration of semi-supervised learning and augmentation for clinical deployment.

Abstract

An often overlooked problem in medical image segmentation research is the effective selection of training subsets to annotate from a complete set of unlabelled data. Many studies select their training sets at random, which may lead to suboptimal model performance, especially in the minimal supervision setting where each training image has a profound effect on performance outcomes. This work aims to address this issue. We use prototypical contrasting learning and clustering to extract representative and diverse samples for annotation. We improve upon prior works with a bespoke cluster-based image selection process. Additionally, we introduce the concept of unsupervised balanced batch dataloading to medical image segmentation, which aims to improve model learning with minimally annotated data. We evaluated our method on a public skin lesion dataset (ISIC 2018) and compared it to another state-of-the-art data sampling method. Our method achieved superior performance in a low annotation budget scenario.

Paper Structure

This paper contains 13 sections, 2 equations, 3 figures, 1 algorithm.

Figures (3)

  • Figure 1: Diagram of our sample selection pipeline.
  • Figure 2: Diagram of ISIC 2018 feature space, projected into 2D space using TSNE van2008visualizing color coded by cluster assignment, with clusters assigned using our pipeline discussed throughout section \ref{['sec:Methodology']}. We can see groupings based on size, opacity and color.
  • Figure 3: DICE Score reported per method and annotation budget. We compare our method, our method without balanced batch loading (Ablation), the SOTA method Contrastive Annotation jin2022one (Competitor), our method using a non-clustering powerful feature extractor (MedSAM ma2024segment) and random selection (4 runs).