Table of Contents
Fetching ...

Segmenting Medical Images with Limited Data

Zhaoshan Liua, Qiujie Lv, Chau Hung Lee, Lei Shen

TL;DR

This work tackles medical image segmentation under limited labeled data by introducing DEMS, a semi-supervised encoder–decoder framework that leverages online automatic augmentation (OAA), residual robustness enhancement (RRE) blocks, and a sensitivity loss to enforce cross-decoder consistency. DEMS uses multiple decoders during training and a multi-term loss with a Gaussian warm-up to stabilize learning from unlabeled data, achieving strong data efficiency. Across four ultrasound datasets, DEMS delivers substantial improvements over state-of-the-art methods, including a 16.85 percentage-point dice-score gain over U-Net under extreme data shortage, and robust performance on public datasets, often with lower training time. These results indicate DEMS’ practical potential for fast, accurate medical image segmentation in settings with scarce annotations and limited data availability.

Abstract

While computer vision has proven valuable for medical image segmentation, its application faces challenges such as limited dataset sizes and the complexity of effectively leveraging unlabeled images. To address these challenges, we present a novel semi-supervised, consistency-based approach termed the data-efficient medical segmenter (DEMS). The DEMS features an encoder-decoder architecture and incorporates the developed online automatic augmenter (OAA) and residual robustness enhancement (RRE) blocks. The OAA augments input data with various image transformations, thereby diversifying the dataset to improve the generalization ability. The RRE enriches feature diversity and introduces perturbations to create varied inputs for different decoders, thereby providing enhanced variability. Moreover, we introduce a sensitive loss to further enhance consistency across different decoders and stabilize the training process. Extensive experimental results on both our own and three public datasets affirm the effectiveness of DEMS. Under extreme data shortage scenarios, our DEMS achieves 16.85\% and 10.37\% improvement in dice score compared with the U-Net and top-performed state-of-the-art method, respectively. Given its superior data efficiency, DEMS could present significant advancements in medical segmentation under small data regimes. The project homepage can be accessed at https://github.com/NUS-Tim/DEMS.

Segmenting Medical Images with Limited Data

TL;DR

This work tackles medical image segmentation under limited labeled data by introducing DEMS, a semi-supervised encoder–decoder framework that leverages online automatic augmentation (OAA), residual robustness enhancement (RRE) blocks, and a sensitivity loss to enforce cross-decoder consistency. DEMS uses multiple decoders during training and a multi-term loss with a Gaussian warm-up to stabilize learning from unlabeled data, achieving strong data efficiency. Across four ultrasound datasets, DEMS delivers substantial improvements over state-of-the-art methods, including a 16.85 percentage-point dice-score gain over U-Net under extreme data shortage, and robust performance on public datasets, often with lower training time. These results indicate DEMS’ practical potential for fast, accurate medical image segmentation in settings with scarce annotations and limited data availability.

Abstract

While computer vision has proven valuable for medical image segmentation, its application faces challenges such as limited dataset sizes and the complexity of effectively leveraging unlabeled images. To address these challenges, we present a novel semi-supervised, consistency-based approach termed the data-efficient medical segmenter (DEMS). The DEMS features an encoder-decoder architecture and incorporates the developed online automatic augmenter (OAA) and residual robustness enhancement (RRE) blocks. The OAA augments input data with various image transformations, thereby diversifying the dataset to improve the generalization ability. The RRE enriches feature diversity and introduces perturbations to create varied inputs for different decoders, thereby providing enhanced variability. Moreover, we introduce a sensitive loss to further enhance consistency across different decoders and stabilize the training process. Extensive experimental results on both our own and three public datasets affirm the effectiveness of DEMS. Under extreme data shortage scenarios, our DEMS achieves 16.85\% and 10.37\% improvement in dice score compared with the U-Net and top-performed state-of-the-art method, respectively. Given its superior data efficiency, DEMS could present significant advancements in medical segmentation under small data regimes. The project homepage can be accessed at https://github.com/NUS-Tim/DEMS.
Paper Structure (20 sections, 10 equations, 10 figures, 5 tables, 1 algorithm)

This paper contains 20 sections, 10 equations, 10 figures, 5 tables, 1 algorithm.

Figures (10)

  • Figure 1: Detailed architecture of DEMS. The DEMS consists of an encoder $E$, a main decoder $D_{m}$, and three auxiliary decoders $D_{a1}$, $D_{a2}$, and $D_{a3}$. It comprises proposed online automatic augmenter (OAA) and residual robustness enhancement (RRE) blocks. The OAA augments the visual input with varying image transformations to diversify the data to enhance generalization ability. The RRE block enriches the feature diversity and introduces perturbations to create varied decoder inputs, thereby bolstering the variability. The loss function comprises fusion loss $L_{f}$, sensitivity loss $L_{s}$, and unsupervised loss $L_{u}$. The introduced sensitivity loss further enhances the consistency across various decoders and stabilizes model training. The $L_{s}$ is formulated based on the XOR operation across the main and auxiliary decoder pairs. The XOR operation is depicted with an overlay of blue and orange dash-dot lines. GT stands for the ground truth.
  • Figure 2: Workflow of the proposed OAA. Each input image and its corresponding mask undergo diverse DA transformations $O$ sampled from augmentation spaces $A$ at each new epoch $e$. The encoder receives diverse inputs in successive epochs as training proceeds, thereby enhancing the generalization capability. The gray dash-dot line depicts the training progress.
  • Figure 3: Detailed structure of the RRE block and the connection structure between the encoder and varying decoders. The RRE block features two distinct input-output pairs denoted with rhombus and circle by the shapes of the starting and ending arrows. It mainly encompasses residual connection, depthwise convolution (DwConv), pointwise convolution (PwConv), and feature perturbation injection (FPI) block. We denote the output of the encoder at varying blocks with $f_{1}$, $f_{2}$, $f_{3}$, $f_{4}$, and $f_{5}$. We represent the streams starting at $f_{1}$, $f_{2}$, $f_{3}$, and $f_{4}$ with black arrows, and streams starting at $f_{5}$ with teal arrows. For clarity and conciseness, the skip connections across the encoder and main decoder are exclusively depicted. BN denotes the batch normalization layer.
  • Figure 4: Predicted masks across DEMS and SOTA methods on the four datasets using 40% labeled data.
  • Figure 5: XOR outputs of predicted and GT masks across DEMS and SOTA methods on the four datasets using 40% labeled data.
  • ...and 5 more figures