Table of Contents
Fetching ...

Exploring UMAP in hybrid models of entropy-based and representativeness sampling for active learning in biomedical segmentation

H. S. Tan, Kuancheng Wang, Rafe Mcbeth

TL;DR

This paper investigates using UMAP-based representativeness within hybrid active-learning schemes for biomedical image segmentation, focusing on entropy-based uncertainty preceding a UMAP-driven representativeness step. By adaptively embedding current model features at each iteration and clustering to select representative samples, Entropy-UMAP achieves the largest Dice gains over a random baseline ($3.2\%$ for cardiac and $4.5\%$ for prostate) and matches or exceeds full-dataset performance on the heart task ($0.96$ Dice). Across two MS Decathlon MRI datasets, the approach reduces the labeling burden by approximately $43\%$ (cardiac) and $25\%$ (prostate) while maintaining competitive performance, indicating a meaningful synergy between uncertainty and topological feature representations. The findings motivate further exploration of UMAP-based sampling in domain adaptation and other active-learning contexts, with potential clinical impact for efficient, accurate biomedical segmentation.

Abstract

In this work, we study various hybrid models of entropy-based and representativeness sampling techniques in the context of active learning in medical segmentation, in particular examining the role of UMAP (Uniform Manifold Approximation and Projection) as a technique for capturing representativeness. Although UMAP has been shown viable as a general purpose dimension reduction method in diverse areas, its role in deep learning-based medical segmentation has yet been extensively explored. Using the cardiac and prostate datasets in the Medical Segmentation Decathlon for validation, we found that a novel hybrid combination of Entropy-UMAP sampling technique achieved a statistically significant Dice score advantage over the random baseline ($3.2 \%$ for cardiac, $4.5 \%$ for prostate), and attained the highest Dice coefficient among the spectrum of 10 distinct active learning methodologies we examined. This provides preliminary evidence that there is an interesting synergy between entropy-based and UMAP methods when the former precedes the latter in a hybrid model of active learning.

Exploring UMAP in hybrid models of entropy-based and representativeness sampling for active learning in biomedical segmentation

TL;DR

This paper investigates using UMAP-based representativeness within hybrid active-learning schemes for biomedical image segmentation, focusing on entropy-based uncertainty preceding a UMAP-driven representativeness step. By adaptively embedding current model features at each iteration and clustering to select representative samples, Entropy-UMAP achieves the largest Dice gains over a random baseline ( for cardiac and for prostate) and matches or exceeds full-dataset performance on the heart task ( Dice). Across two MS Decathlon MRI datasets, the approach reduces the labeling burden by approximately (cardiac) and (prostate) while maintaining competitive performance, indicating a meaningful synergy between uncertainty and topological feature representations. The findings motivate further exploration of UMAP-based sampling in domain adaptation and other active-learning contexts, with potential clinical impact for efficient, accurate biomedical segmentation.

Abstract

In this work, we study various hybrid models of entropy-based and representativeness sampling techniques in the context of active learning in medical segmentation, in particular examining the role of UMAP (Uniform Manifold Approximation and Projection) as a technique for capturing representativeness. Although UMAP has been shown viable as a general purpose dimension reduction method in diverse areas, its role in deep learning-based medical segmentation has yet been extensively explored. Using the cardiac and prostate datasets in the Medical Segmentation Decathlon for validation, we found that a novel hybrid combination of Entropy-UMAP sampling technique achieved a statistically significant Dice score advantage over the random baseline ( for cardiac, for prostate), and attained the highest Dice coefficient among the spectrum of 10 distinct active learning methodologies we examined. This provides preliminary evidence that there is an interesting synergy between entropy-based and UMAP methods when the former precedes the latter in a hybrid model of active learning.
Paper Structure (11 sections, 10 equations, 6 figures, 2 tables, 1 algorithm)

This paper contains 11 sections, 10 equations, 6 figures, 2 tables, 1 algorithm.

Figures (6)

  • Figure 1: A simple sketch depicting the overall structure and logical flow of our active learning methods. The blue lines above pertain to 'representativeness-Entropy' hybrid algorithms whereas the red ones pertain to the class of 'Entropy-representativeness'. We also studied purely entropy-based and representativeness sampling methods for comparison. For the annotation stage prior to being in the labeled pool, we included the ground truth masks of the images as the equivalent step of a medical expert manually segmenting the images.
  • Figure 2: We display the evolution of the validation Dice scores for various hybrid models of uncertainty and representativeness sampling. Each active learning iteration involved 10 training epochs and acquired 6 and 16 samples for the prostate and cardiac datasets respectively.
  • Figure 3: Here, we display the learning curves for larger initial (labeled) datasets, with the initial size of each $\mathcal{D}_L$ picked such that the number of samples acquired per iteration and final size of $\mathcal{D}_L$ remained the same as in Fig. \ref{['fig:learningcurves']} while halving the number of active learning iterations.
  • Figure 4: These are two-dimensional projections of the feature vector space obtained via PCA and UMAP. Grey crosses superimposed on the training dataset correspond to images of the validation dataset transformed according to the PCA and UMAP algorithms that are trained on the training dataset. For both PCA and UMAP, the validation data distribution closely aligns with that of the training dataset. The UMAP-transformed space consists of more isolated and linear clusters, in contrast to the spherical and uniform ones in PCA-transformed space.
  • Figure 5: An illustrative set of prostate MRI images with the contours outlined in red for various hybrid or singular models, overlaid on the ground truth mask in yellow which portrays the combined peripheral and central zones. These contours were obtained at the 50${}^{\text{th}}$ iteration. The last two figures in the bottom row correspond to the contours predicted by the initial model before active learning was started and the ground truth.
  • ...and 1 more figures