Table of Contents
Fetching ...

SingleStrip: learning skull-stripping from a single labeled example

Bella Specktor-Fadida, Malte Hoffmann

TL;DR

The paper addresses the data bottleneck in medical image segmentation, focusing on skull-stripping with minimal labeling. It combines domain randomization-based data synthesis with semi-supervised self-training and introduces an autoencoder-based quality control to select reliable pseudo-labels. On 3D MRI skull-stripping tasks, it achieves competitive performance on out-of-distribution data using only a single labeled example, and AE-based QC outperforms test-time augmentation in predicting label quality. This approach reduces labeling effort for new anatomical structures or imaging modalities and could be extended to more complex segmentation problems.

Abstract

Deep learning segmentation relies heavily on labeled data, but manual labeling is laborious and time-consuming, especially for volumetric images such as brain magnetic resonance imaging (MRI). While recent domain-randomization techniques alleviate the dependency on labeled data by synthesizing diverse training images from label maps, they offer limited anatomical variability when very few label maps are available. Semi-supervised self-training addresses label scarcity by iteratively incorporating model predictions into the training set, enabling networks to learn from unlabeled data. In this work, we combine domain randomization with self-training to train three-dimensional skull-stripping networks using as little as a single labeled example. First, we automatically bin voxel intensities, yielding labels we use to synthesize images for training an initial skull-stripping model. Second, we train a convolutional autoencoder (AE) on the labeled example and use its reconstruction error to assess the quality of brain masks predicted for unlabeled data. Third, we select the top-ranking pseudo-labels to fine-tune the network, achieving skull-stripping performance on out-of-distribution data that approaches models trained with more labeled images. We compare AE-based ranking to consistency-based ranking under test-time augmentation, finding that the AE approach yields a stronger correlation with segmentation accuracy. Our results highlight the potential of combining domain randomization and AE-based quality control to enable effective semi-supervised segmentation from extremely limited labeled data. This strategy may ease the labeling burden that slows progress in studies involving new anatomical structures or emerging imaging techniques.

SingleStrip: learning skull-stripping from a single labeled example

TL;DR

The paper addresses the data bottleneck in medical image segmentation, focusing on skull-stripping with minimal labeling. It combines domain randomization-based data synthesis with semi-supervised self-training and introduces an autoencoder-based quality control to select reliable pseudo-labels. On 3D MRI skull-stripping tasks, it achieves competitive performance on out-of-distribution data using only a single labeled example, and AE-based QC outperforms test-time augmentation in predicting label quality. This approach reduces labeling effort for new anatomical structures or imaging modalities and could be extended to more complex segmentation problems.

Abstract

Deep learning segmentation relies heavily on labeled data, but manual labeling is laborious and time-consuming, especially for volumetric images such as brain magnetic resonance imaging (MRI). While recent domain-randomization techniques alleviate the dependency on labeled data by synthesizing diverse training images from label maps, they offer limited anatomical variability when very few label maps are available. Semi-supervised self-training addresses label scarcity by iteratively incorporating model predictions into the training set, enabling networks to learn from unlabeled data. In this work, we combine domain randomization with self-training to train three-dimensional skull-stripping networks using as little as a single labeled example. First, we automatically bin voxel intensities, yielding labels we use to synthesize images for training an initial skull-stripping model. Second, we train a convolutional autoencoder (AE) on the labeled example and use its reconstruction error to assess the quality of brain masks predicted for unlabeled data. Third, we select the top-ranking pseudo-labels to fine-tune the network, achieving skull-stripping performance on out-of-distribution data that approaches models trained with more labeled images. We compare AE-based ranking to consistency-based ranking under test-time augmentation, finding that the AE approach yields a stronger correlation with segmentation accuracy. Our results highlight the potential of combining domain randomization and AE-based quality control to enable effective semi-supervised segmentation from extremely limited labeled data. This strategy may ease the labeling burden that slows progress in studies involving new anatomical structures or emerging imaging techniques.

Paper Structure

This paper contains 5 sections, 5 figures, 1 table.

Figures (5)

  • Figure 1: Method overview. To learn skull-stripping from a single labeled image, we first fit a Gaussian mixture model (GMM), assigning voxel intensities to $c$ classes. From these, we synthesize diverse images to train a skull-stripping U-Net. In parallel, we train an autoencoder (AE) to reconstruct brain masks for quality control. Assuming high-quality masks change least in AE reconstruction, we use both networks to skull-strip an unlabeled dataset and retain the least-changing predictions to fine-tune the U-Net via GMM synthesis. This scheme can be repeated several times as needed.
  • Figure 2: Skull-stripping examples. Methods SSL-TTA and SSL-AE fine-tune a U-Net trained with a "Single example" using pseudo-labels selected via test-time augmentation and autoencoder reconstruction, respectively. Each row shows a different ASL subject.
  • Figure 3: Skull-stripping accuracy for in-distribution FSM and out-of-distribution ASL images. We compare training with a single example (SL-1), further fine-tuning using pseudo-labels selected using autoencoder reconstruction (SSL-AE) or pseudo-labels most similar to the ground-truth brain masks (SSL-T), and training with 16 examples (SL-16). Higher Dice and lower Hausdorff values are better.
  • Figure 4: Ablation experiments comparing training with a single example (SL-1) to further fine-tuning using pseudo-labels selected with autoencoder reconstruction (SSL-AE), test-time augmentation (SSL-TTA), or via comparison to ground-truth brain masks (SSL-T). In addition, we test retraining from scratch (SSL-T no fine-tuning).
  • Figure 5: Autoencoder (AE) reconstructions. Left: input brain mask. Center: AE output superimposed with the input. Right: ground-truth mask superimposed with the input.