Table of Contents
Fetching ...

DRL-STNet: Unsupervised Domain Adaptation for Cross-modality Medical Image Segmentation via Disentangled Representation Learning

Hui Lin, Florian Schiffers, Santiago López-Tapia, Neda Tavakoli, Daniel Kim, Aggelos K. Katsaggelos

TL;DR

This work tackles cross-modality medical image segmentation under unsupervised domain adaptation by introducing DRL-STNet, which combines disentangled representation learning with GAN-based image translation and self-training. Source CT images are translated into the target MRI domain via a shared content encoder, dual style encoders, and a common decoder, producing synthetic target data with ground-truth labels for segmentation. The segmentation model is initially trained on synthetic data and then iteratively refined using pseudo-labels on unlabeled target data, leveraging a 3D nnU-Net backbone. On the FLARE abdominal dataset, DRL-STNet achieves state-of-the-art performance, notably improving average Dice and Normalized Surface Dice by substantial margins, with self-training providing further gains; the approach demonstrates strong potential for practical cross-modality clinical segmentation, albeit at notable computational cost and with dependence on the quality of the disentangled representations.

Abstract

Unsupervised domain adaptation (UDA) is essential for medical image segmentation, especially in cross-modality data scenarios. UDA aims to transfer knowledge from a labeled source domain to an unlabeled target domain, thereby reducing the dependency on extensive manual annotations. This paper presents DRL-STNet, a novel framework for cross-modality medical image segmentation that leverages generative adversarial networks (GANs), disentangled representation learning (DRL), and self-training (ST). Our method leverages DRL within a GAN to translate images from the source to the target modality. Then, the segmentation model is initially trained with these translated images and corresponding source labels and then fine-tuned iteratively using a combination of synthetic and real images with pseudo-labels and real labels. The proposed framework exhibits superior performance in abdominal organ segmentation on the FLARE challenge dataset, surpassing state-of-the-art methods by 11.4% in the Dice similarity coefficient and by 13.1% in the Normalized Surface Dice metric, achieving scores of 74.21% and 80.69%, respectively. The average running time is 41 seconds, and the area under the GPU memory-time curve is 11,292 MB. These results indicate the potential of DRL-STNet for enhancing cross-modality medical image segmentation tasks.

DRL-STNet: Unsupervised Domain Adaptation for Cross-modality Medical Image Segmentation via Disentangled Representation Learning

TL;DR

This work tackles cross-modality medical image segmentation under unsupervised domain adaptation by introducing DRL-STNet, which combines disentangled representation learning with GAN-based image translation and self-training. Source CT images are translated into the target MRI domain via a shared content encoder, dual style encoders, and a common decoder, producing synthetic target data with ground-truth labels for segmentation. The segmentation model is initially trained on synthetic data and then iteratively refined using pseudo-labels on unlabeled target data, leveraging a 3D nnU-Net backbone. On the FLARE abdominal dataset, DRL-STNet achieves state-of-the-art performance, notably improving average Dice and Normalized Surface Dice by substantial margins, with self-training providing further gains; the approach demonstrates strong potential for practical cross-modality clinical segmentation, albeit at notable computational cost and with dependence on the quality of the disentangled representations.

Abstract

Unsupervised domain adaptation (UDA) is essential for medical image segmentation, especially in cross-modality data scenarios. UDA aims to transfer knowledge from a labeled source domain to an unlabeled target domain, thereby reducing the dependency on extensive manual annotations. This paper presents DRL-STNet, a novel framework for cross-modality medical image segmentation that leverages generative adversarial networks (GANs), disentangled representation learning (DRL), and self-training (ST). Our method leverages DRL within a GAN to translate images from the source to the target modality. Then, the segmentation model is initially trained with these translated images and corresponding source labels and then fine-tuned iteratively using a combination of synthetic and real images with pseudo-labels and real labels. The proposed framework exhibits superior performance in abdominal organ segmentation on the FLARE challenge dataset, surpassing state-of-the-art methods by 11.4% in the Dice similarity coefficient and by 13.1% in the Normalized Surface Dice metric, achieving scores of 74.21% and 80.69%, respectively. The average running time is 41 seconds, and the area under the GPU memory-time curve is 11,292 MB. These results indicate the potential of DRL-STNet for enhancing cross-modality medical image segmentation tasks.
Paper Structure (21 sections, 9 equations, 4 figures, 6 tables)

This paper contains 21 sections, 9 equations, 4 figures, 6 tables.

Figures (4)

  • Figure 1: Overview of the proposed DRL-STNet framework. The framework consists of five stages: Stage 1-2: Perform image translation from source to target. Train an image translation model based on disentangled representation learning to generate synthetic target volumes from real source volumes. Stage 3-5: Perform self-training via pseudo-labeling. Train the segmentation model using the synthetic target volumes and the corresponding source labels. Predict pseudo-labels on unlabeled target volumes and finetune the segmentation model with the combined data. Stages 4 and 5 are performed iteratively. The detailed architecture of the image translation model is described in Fig. \ref{['fig: translation']}. Viewing this figure in color is advised in the printed edition.
  • Figure 2: The proposed image translation model using representation disentanglement. The model is composed of one shared content encoder $E_{c}$, two style encoders $E_{s}^{a}$ and $E_{s}^{b}$, and one shared decoder $G$. The image in each domain is disentangled into the content and style representations. The source image ($x^a$) can be transferred into the target style ($b$) by combining $c^a$ and $s^b$.
  • Figure 3: Examples of source (CT), target (MRI), and generated slices produced by the proposed method. Since there is no ground truth for unpaired image translation, the small differences between the first and second columns, as well as between the first and fourth columns, suggest that our translation model is reliable.
  • Figure 4: Examples of segmentation results from the validation set. The first two rows illustrate successful segmentation outcomes, while the last two rows demonstrate cases with less accurate segmentation. The columns represent the original image, ground truth, results from our method, and results from our method without self-training (ST). Different organs are color-coded for clear visualization.