DRL-STNet: Unsupervised Domain Adaptation for Cross-modality Medical Image Segmentation via Disentangled Representation Learning
Hui Lin, Florian Schiffers, Santiago López-Tapia, Neda Tavakoli, Daniel Kim, Aggelos K. Katsaggelos
TL;DR
This work tackles cross-modality medical image segmentation under unsupervised domain adaptation by introducing DRL-STNet, which combines disentangled representation learning with GAN-based image translation and self-training. Source CT images are translated into the target MRI domain via a shared content encoder, dual style encoders, and a common decoder, producing synthetic target data with ground-truth labels for segmentation. The segmentation model is initially trained on synthetic data and then iteratively refined using pseudo-labels on unlabeled target data, leveraging a 3D nnU-Net backbone. On the FLARE abdominal dataset, DRL-STNet achieves state-of-the-art performance, notably improving average Dice and Normalized Surface Dice by substantial margins, with self-training providing further gains; the approach demonstrates strong potential for practical cross-modality clinical segmentation, albeit at notable computational cost and with dependence on the quality of the disentangled representations.
Abstract
Unsupervised domain adaptation (UDA) is essential for medical image segmentation, especially in cross-modality data scenarios. UDA aims to transfer knowledge from a labeled source domain to an unlabeled target domain, thereby reducing the dependency on extensive manual annotations. This paper presents DRL-STNet, a novel framework for cross-modality medical image segmentation that leverages generative adversarial networks (GANs), disentangled representation learning (DRL), and self-training (ST). Our method leverages DRL within a GAN to translate images from the source to the target modality. Then, the segmentation model is initially trained with these translated images and corresponding source labels and then fine-tuned iteratively using a combination of synthetic and real images with pseudo-labels and real labels. The proposed framework exhibits superior performance in abdominal organ segmentation on the FLARE challenge dataset, surpassing state-of-the-art methods by 11.4% in the Dice similarity coefficient and by 13.1% in the Normalized Surface Dice metric, achieving scores of 74.21% and 80.69%, respectively. The average running time is 41 seconds, and the area under the GPU memory-time curve is 11,292 MB. These results indicate the potential of DRL-STNet for enhancing cross-modality medical image segmentation tasks.
