Table of Contents
Fetching ...

Versatile Medical Image Segmentation Learned from Multi-Source Datasets via Model Self-Disambiguation

Xiaoyang Chen, Hao Zheng, Yuemeng Li, Yuncong Ma, Liang Ma, Hongming Li, Yong Fan

TL;DR

The paper tackles the data bottleneck in versatile medical image segmentation by proposing a weakly supervised framework that learns from multi-source datasets with partial or sparse labels. It combines model self-disambiguation via ambiguity-aware losses, entropy-based prior knowledge regularization, and hierarchical sampling to balance cross-domain data. The approach uses a 3D TransUNet backbone and yields state-of-the-art performance (e.g., $DSC \approx 88.7\%$) on eight-source abdominal segmentation, while enabling single-pass inference across all structures and reducing annotation costs. This work demonstrates robust generalization across modalities and datasets and provides a practical path toward scalable, cost-efficient deployment of universal segmentation models.

Abstract

A versatile medical image segmentation model applicable to images acquired with diverse equipment and protocols can facilitate model deployment and maintenance. However, building such a model typically demands a large, diverse, and fully annotated dataset, which is challenging to obtain due to the labor-intensive nature of data curation. To address this challenge, we propose a cost-effective alternative that harnesses multi-source data with only partial or sparse segmentation labels for training, substantially reducing the cost of developing a versatile model. We devise strategies for model self-disambiguation, prior knowledge incorporation, and imbalance mitigation to tackle challenges associated with inconsistently labeled multi-source data, including label ambiguity and modality, dataset, and class imbalances. Experimental results on a multi-modal dataset compiled from eight different sources for abdominal structure segmentation have demonstrated the effectiveness and superior performance of our method compared to state-of-the-art alternative approaches. We anticipate that its cost-saving features, which optimize the utilization of existing annotated data and reduce annotation efforts for new data, will have a significant impact in the field.

Versatile Medical Image Segmentation Learned from Multi-Source Datasets via Model Self-Disambiguation

TL;DR

The paper tackles the data bottleneck in versatile medical image segmentation by proposing a weakly supervised framework that learns from multi-source datasets with partial or sparse labels. It combines model self-disambiguation via ambiguity-aware losses, entropy-based prior knowledge regularization, and hierarchical sampling to balance cross-domain data. The approach uses a 3D TransUNet backbone and yields state-of-the-art performance (e.g., ) on eight-source abdominal segmentation, while enabling single-pass inference across all structures and reducing annotation costs. This work demonstrates robust generalization across modalities and datasets and provides a practical path toward scalable, cost-efficient deployment of universal segmentation models.

Abstract

A versatile medical image segmentation model applicable to images acquired with diverse equipment and protocols can facilitate model deployment and maintenance. However, building such a model typically demands a large, diverse, and fully annotated dataset, which is challenging to obtain due to the labor-intensive nature of data curation. To address this challenge, we propose a cost-effective alternative that harnesses multi-source data with only partial or sparse segmentation labels for training, substantially reducing the cost of developing a versatile model. We devise strategies for model self-disambiguation, prior knowledge incorporation, and imbalance mitigation to tackle challenges associated with inconsistently labeled multi-source data, including label ambiguity and modality, dataset, and class imbalances. Experimental results on a multi-modal dataset compiled from eight different sources for abdominal structure segmentation have demonstrated the effectiveness and superior performance of our method compared to state-of-the-art alternative approaches. We anticipate that its cost-saving features, which optimize the utilization of existing annotated data and reduce annotation efforts for new data, will have a significant impact in the field.
Paper Structure (20 sections, 6 equations, 5 figures, 16 tables)

This paper contains 20 sections, 6 equations, 5 figures, 16 tables.

Figures (5)

  • Figure 1: Illustrations of (a) fully labeled, (b) partially labeled, and (c) sparsely labeled images. The fully labeled image contains annotations for all anatomical structures of interest, the partially labeled image includes labels for a subset, and the sparsely labeled image provides annotations for only a fraction of the slices and structures. Note that annotated structures are fully marked within a particular volume (b) or slice (c).
  • Figure 2: Overview of our approach. It trains a model by using hierarchical sampling for training example generation, 3D TransUNet as its base network, two ambiguity-aware losses and a prior knowledge-based entropy minimization regularization term for guidance.
  • Figure 3: (a): Training and testing image composition. (b): Annotated anatomical structures in different datasets.
  • Figure 4: Visual comparison between the ground truth and the predictions generated by DoDNet, CLIP-driven and the proposed method on subjects from different datasets. For a clearer view of detailed differences, zoom in to closely examine the results.
  • Figure 5: Visual comparisons between the ground truth and predictions from models trained with 20% slices of the axial view, 100% slices of the axial view (loss is computed slice-wise to emulate sparsely labeled data), and hybrid data (the entirety of AMOS, BTCV, and FLARE22 is utilized, while 20% slices of the axial view are taken from other datasets for training) on subjects from various datasets. For a clearer view of detailed differences, zoom in to closely examine the results.