An Attentive Representative Sample Selection Strategy Combined with Balanced Batch Training for Skin Lesion Segmentation
Stephen Lloyd-Brown, Susan Francis, Caroline Hoad, Penny Gowland, Karen Mullinger, Andrew French, Xin Chen
TL;DR
Medical image segmentation suffers from annotation bottlenecks; this paper introduces a one-shot active learning pipeline that combines prototypical contrastive learning with clustering to select representative samples, together with an unsupervised balanced batch loading strategy to improve learning from a small labeled subset. By automatically determining the cluster count with a bespoke $K$-optimization and sampling $D$ images per cluster (where $D = N/K$), the method maintains diversity while minimizing labeling effort; the nnU-Net segmentation model is trained under balanced batches, yielding superior performance on the ISIC-2018 skin lesion dataset compared to a state-of-the-art contrastive annotation method and random sampling, with ablations confirming the benefits of balanced batching and cluster-aware features. The approach promises improved sample efficiency for medical image segmentation and can be extended to other modalities such as MRI, with potential integration of semi-supervised learning and augmentation for clinical deployment.
Abstract
An often overlooked problem in medical image segmentation research is the effective selection of training subsets to annotate from a complete set of unlabelled data. Many studies select their training sets at random, which may lead to suboptimal model performance, especially in the minimal supervision setting where each training image has a profound effect on performance outcomes. This work aims to address this issue. We use prototypical contrasting learning and clustering to extract representative and diverse samples for annotation. We improve upon prior works with a bespoke cluster-based image selection process. Additionally, we introduce the concept of unsupervised balanced batch dataloading to medical image segmentation, which aims to improve model learning with minimally annotated data. We evaluated our method on a public skin lesion dataset (ISIC 2018) and compared it to another state-of-the-art data sampling method. Our method achieved superior performance in a low annotation budget scenario.
