Table of Contents
Fetching ...

An Active Learning Pipeline for Biomedical Image Instance Segmentation with Minimal Human Intervention

Shuo Zhao, Yu Zhou, Jianxu Chen

TL;DR

This paper addresses the data-label bottleneck in biomedical image segmentation by proposing a data-centric active-learning pipeline that integrates foundation-model pseudo-labeling with nnU-Net self-configuration. The approach bootstraps training with CellSAM-generated pseudo-labels, uses MAE-based self-supervised features to select a representative core-set, and applies microSAM-assisted manual labeling to fine-tune nnU-Net. On the 3D MitoEM mitochondria segmentation task, the method achieves competitive performance with as little as 12.5% of manual annotations, approaching 90% of full-label results and outperforming random-core-set baselines. The framework offers a practical pathway to deploy state-of-the-art segmentation with substantially reduced labeling effort and potential for continuous learning in dynamic biomedical datasets.

Abstract

Biomedical image segmentation is critical for precise structure delineation and downstream analysis. Traditional methods often struggle with noisy data, while deep learning models such as U-Net have set new benchmarks in segmentation performance. nnU-Net further automates model configuration, making it adaptable across datasets without extensive tuning. However, it requires a substantial amount of annotated data for cross-validation, posing a challenge when only raw images but no labels are available. Large foundation models offer zero-shot generalizability, but may underperform on specific datasets with unique characteristics, limiting their direct use for analysis. This work addresses these bottlenecks by proposing a data-centric AI workflow that leverages active learning and pseudo-labeling to combine the strengths of traditional neural networks and large foundation models while minimizing human intervention. The pipeline starts by generating pseudo-labels from a foundation model, which are then used for nnU-Net's self-configuration. Subsequently, a representative core-set is selected for minimal manual annotation, enabling effective fine-tuning of the nnU-Net model. This approach significantly reduces the need for manual annotations while maintaining competitive performance, providing an accessible solution for biomedical researchers to apply state-of-the-art AI techniques in their segmentation tasks. The code is available at https://github.com/MMV-Lab/AL_BioMed_img_seg.

An Active Learning Pipeline for Biomedical Image Instance Segmentation with Minimal Human Intervention

TL;DR

This paper addresses the data-label bottleneck in biomedical image segmentation by proposing a data-centric active-learning pipeline that integrates foundation-model pseudo-labeling with nnU-Net self-configuration. The approach bootstraps training with CellSAM-generated pseudo-labels, uses MAE-based self-supervised features to select a representative core-set, and applies microSAM-assisted manual labeling to fine-tune nnU-Net. On the 3D MitoEM mitochondria segmentation task, the method achieves competitive performance with as little as 12.5% of manual annotations, approaching 90% of full-label results and outperforming random-core-set baselines. The framework offers a practical pathway to deploy state-of-the-art segmentation with substantially reduced labeling effort and potential for continuous learning in dynamic biomedical datasets.

Abstract

Biomedical image segmentation is critical for precise structure delineation and downstream analysis. Traditional methods often struggle with noisy data, while deep learning models such as U-Net have set new benchmarks in segmentation performance. nnU-Net further automates model configuration, making it adaptable across datasets without extensive tuning. However, it requires a substantial amount of annotated data for cross-validation, posing a challenge when only raw images but no labels are available. Large foundation models offer zero-shot generalizability, but may underperform on specific datasets with unique characteristics, limiting their direct use for analysis. This work addresses these bottlenecks by proposing a data-centric AI workflow that leverages active learning and pseudo-labeling to combine the strengths of traditional neural networks and large foundation models while minimizing human intervention. The pipeline starts by generating pseudo-labels from a foundation model, which are then used for nnU-Net's self-configuration. Subsequently, a representative core-set is selected for minimal manual annotation, enabling effective fine-tuning of the nnU-Net model. This approach significantly reduces the need for manual annotations while maintaining competitive performance, providing an accessible solution for biomedical researchers to apply state-of-the-art AI techniques in their segmentation tasks. The code is available at https://github.com/MMV-Lab/AL_BioMed_img_seg.

Paper Structure

This paper contains 7 sections, 2 figures, 2 tables.

Figures (2)

  • Figure 1.1: An active learning pipeline for 3D unlabeled biomedical image instance segmentation.
  • Figure 1.2: Labels for the train set: (a) pseudo-labels from cellSAM and (b) manual labels. Segmentations for the test set: (c) pre-trained with pseudo-labels, (d) fine-tuned with 12.5% manual labels, (e) fine-tuned with 100% manual labels, and (f) GT. Yellow arrows in (a) and (b) show differences between pseudo and manual labels, while white arrows in (c)–(f) highlight discrepancies between segmentations and the GT.