Table of Contents
Fetching ...

Diffuse-UDA: Addressing Unsupervised Domain Adaptation in Medical Image Segmentation with Appearance and Structure Aligned Diffusion Models

Haifan Gong, Yitao Wang, Yihan Wang, Jiashun Xiao, Xiang Wan, Haofeng Li

TL;DR

Diffuse-UDA tackles unsupervised domain adaptation for 3D medical image segmentation under cross-center and cross-modality domain shifts, where voxel-level labels are scarce. It introduces ASCPlus to improve target pseudo-label quality and a conditional diffusion model with deformable augmentation to synthesize high-quality image–mask pairs aligned to the target domain, then trains on a mix of source and generated data. Extensive experiments on FeTA fetal brain MRI and MM-WHS cardiac datasets show that Diffuse-UDA outperforms state-of-the-art UDA and SSL methods and approaches or even surpasses the performance of models trained with target-domain labels. The approach also provides synthetic data at scale and demonstrates improved feature alignment across domains, suggesting practical potential for fairer and more robust AI deployment in diverse clinical settings. Overall, Diffuse-UDA offers a scalable plug-and-play solution to bridge domain gaps in medical imaging and enable cross-center adoption of AI tools.

Abstract

The scarcity and complexity of voxel-level annotations in 3D medical imaging present significant challenges, particularly due to the domain gap between labeled datasets from well-resourced centers and unlabeled datasets from less-resourced centers. This disparity affects the fairness of artificial intelligence algorithms in healthcare. We introduce Diffuse-UDA, a novel method leveraging diffusion models to tackle Unsupervised Domain Adaptation (UDA) in medical image segmentation. Diffuse-UDA generates high-quality image-mask pairs with target domain characteristics and various structures, thereby enhancing UDA tasks. Initially, pseudo labels for target domain samples are generated. Subsequently, a specially tailored diffusion model, incorporating deformable augmentations, is trained on image-label or image-pseudo-label pairs from both domains. Finally, source domain labels guide the diffusion model to generate image-label pairs for the target domain. Comprehensive evaluations on several benchmarks demonstrate that Diffuse-UDA outperforms leading UDA and semi-supervised strategies, achieving performance close to or even surpassing the theoretical upper bound of models trained directly on target domain data. Diffuse-UDA offers a pathway to advance the development and deployment of AI systems in medical imaging, addressing disparities between healthcare environments. This approach enables the exploration of innovative AI-driven diagnostic tools, improves outcomes, saves time, and reduces human error.

Diffuse-UDA: Addressing Unsupervised Domain Adaptation in Medical Image Segmentation with Appearance and Structure Aligned Diffusion Models

TL;DR

Diffuse-UDA tackles unsupervised domain adaptation for 3D medical image segmentation under cross-center and cross-modality domain shifts, where voxel-level labels are scarce. It introduces ASCPlus to improve target pseudo-label quality and a conditional diffusion model with deformable augmentation to synthesize high-quality image–mask pairs aligned to the target domain, then trains on a mix of source and generated data. Extensive experiments on FeTA fetal brain MRI and MM-WHS cardiac datasets show that Diffuse-UDA outperforms state-of-the-art UDA and SSL methods and approaches or even surpasses the performance of models trained with target-domain labels. The approach also provides synthetic data at scale and demonstrates improved feature alignment across domains, suggesting practical potential for fairer and more robust AI deployment in diverse clinical settings. Overall, Diffuse-UDA offers a scalable plug-and-play solution to bridge domain gaps in medical imaging and enable cross-center adoption of AI tools.

Abstract

The scarcity and complexity of voxel-level annotations in 3D medical imaging present significant challenges, particularly due to the domain gap between labeled datasets from well-resourced centers and unlabeled datasets from less-resourced centers. This disparity affects the fairness of artificial intelligence algorithms in healthcare. We introduce Diffuse-UDA, a novel method leveraging diffusion models to tackle Unsupervised Domain Adaptation (UDA) in medical image segmentation. Diffuse-UDA generates high-quality image-mask pairs with target domain characteristics and various structures, thereby enhancing UDA tasks. Initially, pseudo labels for target domain samples are generated. Subsequently, a specially tailored diffusion model, incorporating deformable augmentations, is trained on image-label or image-pseudo-label pairs from both domains. Finally, source domain labels guide the diffusion model to generate image-label pairs for the target domain. Comprehensive evaluations on several benchmarks demonstrate that Diffuse-UDA outperforms leading UDA and semi-supervised strategies, achieving performance close to or even surpassing the theoretical upper bound of models trained directly on target domain data. Diffuse-UDA offers a pathway to advance the development and deployment of AI systems in medical imaging, addressing disparities between healthcare environments. This approach enables the exploration of innovative AI-driven diagnostic tools, improves outcomes, saves time, and reduces human error.
Paper Structure (22 sections, 16 equations, 6 figures, 4 tables, 1 algorithm)

This paper contains 22 sections, 16 equations, 6 figures, 4 tables, 1 algorithm.

Figures (6)

  • Figure 1: Comparison of conventional medical AI applications and our DiffuseUDA medical AI application. a. Conventional medical AI applications are trained on annotated data from Center A and used at Center A, but their performance often decreases significantly when transferred to Center B due to domain differences in imaging and patient conditions. b. The proposed DiffuseUDA method utilizes a diffusion model to generate new data based on labeled data from Center A and unlabeled data from Center B, with an additional style-structure consistency network to ensure the trained model's usability when switching to other data centers.
  • Figure 2: Two typical domain gap situations. a. Patients at Center A use an MR scanner of model A, while patients at Center B use an MR scanner of model B, resulting in differences in individual patient characteristics and imaging equipment parameters. b. Patients at Center A use an MR scanner, while patients at Center B use a CT scanner, leading to differences in individual patient characteristics and imaging modalities. c. Visualization of high-level (feature between encoder and decoder) feature from different domain data. d. Visualization of decoder's ending feature from different domain.
  • Figure 3: Overview of Diffuse-UDA. a. The three key steps in our process and the respective time required for each; b. The meaning of the letters in this table. c. The first step, using the proposed ASCPlus model, applies pseudo-labels to the unlabeled data from the target domain; d. The second step, using the proposed Diffusion model and the paired image-mask data to train the conditional diffusion model and generate samples with various appearances and structures using masks from the source domain. e. Using the source domain data and pseudo-labeled data as new source domain data, and unlabeled data as target domain data, we train the ASCPlus model for deployment at Center B to provide predictions. f. The proposed ASCPlus network architecture, where EMA represents exponential moving average updates. The student model learns from source data $D_{s}$ and frequency-based transformed source data $D_{sft}$ via the supervised loss $L_{seg}$. The appearance and structure consistency is achieved by the loss $L_{asc}$. g. Detailed design of our conditional diffusion model, which contains a deformable augmentation module for diverse and robust generation. FC layer indicates a fully connected layer for embedding the condition (label).
  • Figure 4: Detailed analysis of using different types of training data construction in our diffuse UDA framework. (a) investigates the effect of using different amounts of sampled data from the source domain and target domain. (b) and (c) indicates the use of the deformation module on different types of training data. (d) and (e) analysis of the effect of different quality of target domain pseudo label (LQPL: low-quality pseudo-label predicted by the PL yan2022unsupervised. MQPL: middle-quality pseudo-label predicted by ASC xu2023asc. HQPL: high-quality pseudo-label predicted by ours ASCPlus.) (f) details the analysis of training ASCPlus with and without our proposed deformable data augumentation module.
  • Figure 5: Detailed analysis on using different types of testing data in our diffuse UDA framework. (a) a different number of samples to train the neural network. (b) different sampling strategy to get the training data.
  • ...and 1 more figures