Exploring UMAP in hybrid models of entropy-based and representativeness sampling for active learning in biomedical segmentation
H. S. Tan, Kuancheng Wang, Rafe Mcbeth
TL;DR
This paper investigates using UMAP-based representativeness within hybrid active-learning schemes for biomedical image segmentation, focusing on entropy-based uncertainty preceding a UMAP-driven representativeness step. By adaptively embedding current model features at each iteration and clustering to select representative samples, Entropy-UMAP achieves the largest Dice gains over a random baseline ($3.2\%$ for cardiac and $4.5\%$ for prostate) and matches or exceeds full-dataset performance on the heart task ($0.96$ Dice). Across two MS Decathlon MRI datasets, the approach reduces the labeling burden by approximately $43\%$ (cardiac) and $25\%$ (prostate) while maintaining competitive performance, indicating a meaningful synergy between uncertainty and topological feature representations. The findings motivate further exploration of UMAP-based sampling in domain adaptation and other active-learning contexts, with potential clinical impact for efficient, accurate biomedical segmentation.
Abstract
In this work, we study various hybrid models of entropy-based and representativeness sampling techniques in the context of active learning in medical segmentation, in particular examining the role of UMAP (Uniform Manifold Approximation and Projection) as a technique for capturing representativeness. Although UMAP has been shown viable as a general purpose dimension reduction method in diverse areas, its role in deep learning-based medical segmentation has yet been extensively explored. Using the cardiac and prostate datasets in the Medical Segmentation Decathlon for validation, we found that a novel hybrid combination of Entropy-UMAP sampling technique achieved a statistically significant Dice score advantage over the random baseline ($3.2 \%$ for cardiac, $4.5 \%$ for prostate), and attained the highest Dice coefficient among the spectrum of 10 distinct active learning methodologies we examined. This provides preliminary evidence that there is an interesting synergy between entropy-based and UMAP methods when the former precedes the latter in a hybrid model of active learning.
