Table of Contents
Fetching ...

Labeled-to-Unlabeled Distribution Alignment for Partially-Supervised Multi-Organ Medical Image Segmentation

Xixi Jiang, Dong Zhang, Xiang Li, Kangyi Liu, Kwang-Ting Cheng, Xin Yang

TL;DR

The paper tackles partially-supervised multi-organ medical image segmentation where each dataset labels only one organ, causing a distribution mismatch between labeled and unlabeled pixels. It introduces LTUDA, a labeled-to-unlabeled distribution alignment framework that combines cross-set data augmentation and prototype-based distribution alignment within a teacher-student FixMatch framework to align labeled and unlabeled feature distributions. Cross-set augmentation interpolates labeled and unlabeled pixels to broaden training distributions, while prototype alignment improves intra-class compactness and separation of unlabeled foreground from background. On LSPL and AbdomenCT-1K, LTUDA achieves state-of-the-art results and even surpasses some fully supervised baselines, with public code available for reproducibility.

Abstract

Partially-supervised multi-organ medical image segmentation aims to develop a unified semantic segmentation model by utilizing multiple partially-labeled datasets, with each dataset providing labels for a single class of organs. However, the limited availability of labeled foreground organs and the absence of supervision to distinguish unlabeled foreground organs from the background pose a significant challenge, which leads to a distribution mismatch between labeled and unlabeled pixels. Although existing pseudo-labeling methods can be employed to learn from both labeled and unlabeled pixels, they are prone to performance degradation in this task, as they rely on the assumption that labeled and unlabeled pixels have the same distribution. In this paper, to address the problem of distribution mismatch, we propose a labeled-to-unlabeled distribution alignment (LTUDA) framework that aligns feature distributions and enhances discriminative capability. Specifically, we introduce a cross-set data augmentation strategy, which performs region-level mixing between labeled and unlabeled organs to reduce distribution discrepancy and enrich the training set. Besides, we propose a prototype-based distribution alignment method that implicitly reduces intra-class variation and increases the separation between the unlabeled foreground and background. This can be achieved by encouraging consistency between the outputs of two prototype classifiers and a linear classifier. Extensive experimental results on the AbdomenCT-1K dataset and a union of four benchmark datasets (including LiTS, MSD-Spleen, KiTS, and NIH82) demonstrate that our method outperforms the state-of-the-art partially-supervised methods by a considerable margin, and even surpasses the fully-supervised methods. The source code is publicly available at https://github.com/xjiangmed/LTUDA.

Labeled-to-Unlabeled Distribution Alignment for Partially-Supervised Multi-Organ Medical Image Segmentation

TL;DR

The paper tackles partially-supervised multi-organ medical image segmentation where each dataset labels only one organ, causing a distribution mismatch between labeled and unlabeled pixels. It introduces LTUDA, a labeled-to-unlabeled distribution alignment framework that combines cross-set data augmentation and prototype-based distribution alignment within a teacher-student FixMatch framework to align labeled and unlabeled feature distributions. Cross-set augmentation interpolates labeled and unlabeled pixels to broaden training distributions, while prototype alignment improves intra-class compactness and separation of unlabeled foreground from background. On LSPL and AbdomenCT-1K, LTUDA achieves state-of-the-art results and even surpasses some fully supervised baselines, with public code available for reproducibility.

Abstract

Partially-supervised multi-organ medical image segmentation aims to develop a unified semantic segmentation model by utilizing multiple partially-labeled datasets, with each dataset providing labels for a single class of organs. However, the limited availability of labeled foreground organs and the absence of supervision to distinguish unlabeled foreground organs from the background pose a significant challenge, which leads to a distribution mismatch between labeled and unlabeled pixels. Although existing pseudo-labeling methods can be employed to learn from both labeled and unlabeled pixels, they are prone to performance degradation in this task, as they rely on the assumption that labeled and unlabeled pixels have the same distribution. In this paper, to address the problem of distribution mismatch, we propose a labeled-to-unlabeled distribution alignment (LTUDA) framework that aligns feature distributions and enhances discriminative capability. Specifically, we introduce a cross-set data augmentation strategy, which performs region-level mixing between labeled and unlabeled organs to reduce distribution discrepancy and enrich the training set. Besides, we propose a prototype-based distribution alignment method that implicitly reduces intra-class variation and increases the separation between the unlabeled foreground and background. This can be achieved by encouraging consistency between the outputs of two prototype classifiers and a linear classifier. Extensive experimental results on the AbdomenCT-1K dataset and a union of four benchmark datasets (including LiTS, MSD-Spleen, KiTS, and NIH82) demonstrate that our method outperforms the state-of-the-art partially-supervised methods by a considerable margin, and even surpasses the fully-supervised methods. The source code is publicly available at https://github.com/xjiangmed/LTUDA.
Paper Structure (16 sections, 9 equations, 10 figures, 9 tables, 1 algorithm)

This paper contains 16 sections, 9 equations, 10 figures, 9 tables, 1 algorithm.

Figures (10)

  • Figure 1: Comparisons of t-SNE feature visualization on the toy dataset consisting of four partially-labeled sub-datasets. The feature distribution of labeled and unlabeled pixels for different classes is visualized. For each foreground category, only one sub-dataset provides a labeled set, while the other three provide unlabeled sets. Since each sub-dataset does not provide the true label of the background, the background is completely unlabeled. We have superimposed the feature centers of the labeled set and unlabeled set, i.e., labeled prototypes and unlabeled prototypes, of each foreground category on the feature distribution. Additionally, we visualized the feature center of the background classes across all subsets. (a) Baseline model (trained on labeled pixels). The labeled prototype and unlabeled prototypes of the foreground classes are not aligned. (b) Baseline model with cross-set data augmentation (CDA). The CDA strategy effectively reduces the distributional discrepancy between labeled and unlabeled pixels for the foreground classes. (c) Our proposed method. The labeled prototype and unlabeled prototypes of each foreground class almost overlap.
  • Figure 2: (a) The overall framework of the proposed LTUDA method, which consists of cross-set data augmentation and prototype-based distribution alignment. (b) Details of the prototype-based distribution alignment module. Our method is built on the popular teacher-student framework and applies weak (rotation and scaling) and strong augmentation (cross-set region-level mixing) to the input images of the teacher and student models, respectively. The linear classifier refers to the linear threshold-based classifier described in Equation \ref{['eq:threshold-based classifier']} of Section \ref{['Preliminary']}. Two prototype classifiers are introduced in the student model, and the predictions of the teacher model and partial labels are combined as pseudo-labels to supervise the outputs of the three classifiers in the student model. The term "copy" denotes that the labeled prototype of the background class is set to be equal to the unlabeled prototype.
  • Figure 3: Visualizations of differently strongly augmented images generated by CutMix. (a) and (b) paste the cropped patch from $x_w^b$ to the same position in $x_w^a$. The box coordinates and sizes of (a) and (b) are different.
  • Figure 4: Visualizations of LSPL. Examples are from the LiTS, MSD-Spleen, KiTS, and NIH82 datasets, respectively, arranged from left to right. (a) Single-organ annotations originally provided by the four benchmark datasets. (b) Full annotations of four organs. (c) to (g) are the segmentation results of different methods. The white frame highlights the better predictions of our method.
  • Figure 5: Ablation results. (a) The number of prototypes per class. (b) The number of strong views.
  • ...and 5 more figures