Table of Contents
Fetching ...

AdaptDiff: Cross-Modality Domain Adaptation via Weak Conditional Semantic Diffusion for Retinal Vessel Segmentation

Dewei Hu, Hao Li, Han Liu, Jiacheng Wang, Xing Yao, Daiwei Lu, Ipek Oguz

TL;DR

An unsupervised domain adaptation (UDA) method named AdaptDiff that enables a retinal vessel segmentation network trained on fundus photography to produce satisfactory results on unseen modalities (e.g., OCT-A) without any manual labels is presented.

Abstract

Deep learning has shown remarkable performance in medical image segmentation. However, despite its promise, deep learning has many challenges in practice due to its inability to effectively transition to unseen domains, caused by the inherent data distribution shift and the lack of manual annotations to guide domain adaptation. To tackle this problem, we present an unsupervised domain adaptation (UDA) method named AdaptDiff that enables a retinal vessel segmentation network trained on fundus photography (FP) to produce satisfactory results on unseen modalities (e.g., OCT-A) without any manual labels. For all our target domains, we first adopt a segmentation model trained on the source domain to create pseudo-labels. With these pseudo-labels, we train a conditional semantic diffusion probabilistic model to represent the target domain distribution. Experimentally, we show that even with low quality pseudo-labels, the diffusion model can still capture the conditional semantic information. Subsequently, we sample on the target domain with binary vessel masks from the source domain to get paired data, i.e., target domain synthetic images conditioned on the binary vessel map. Finally, we fine-tune the pre-trained segmentation network using the synthetic paired data to mitigate the domain gap. We assess the effectiveness of AdaptDiff on seven publicly available datasets across three distinct modalities. Our results demonstrate a significant improvement in segmentation performance across all unseen datasets. Our code is publicly available at https://github.com/DeweiHu/AdaptDiff.

AdaptDiff: Cross-Modality Domain Adaptation via Weak Conditional Semantic Diffusion for Retinal Vessel Segmentation

TL;DR

An unsupervised domain adaptation (UDA) method named AdaptDiff that enables a retinal vessel segmentation network trained on fundus photography to produce satisfactory results on unseen modalities (e.g., OCT-A) without any manual labels is presented.

Abstract

Deep learning has shown remarkable performance in medical image segmentation. However, despite its promise, deep learning has many challenges in practice due to its inability to effectively transition to unseen domains, caused by the inherent data distribution shift and the lack of manual annotations to guide domain adaptation. To tackle this problem, we present an unsupervised domain adaptation (UDA) method named AdaptDiff that enables a retinal vessel segmentation network trained on fundus photography (FP) to produce satisfactory results on unseen modalities (e.g., OCT-A) without any manual labels. For all our target domains, we first adopt a segmentation model trained on the source domain to create pseudo-labels. With these pseudo-labels, we train a conditional semantic diffusion probabilistic model to represent the target domain distribution. Experimentally, we show that even with low quality pseudo-labels, the diffusion model can still capture the conditional semantic information. Subsequently, we sample on the target domain with binary vessel masks from the source domain to get paired data, i.e., target domain synthetic images conditioned on the binary vessel map. Finally, we fine-tune the pre-trained segmentation network using the synthetic paired data to mitigate the domain gap. We assess the effectiveness of AdaptDiff on seven publicly available datasets across three distinct modalities. Our results demonstrate a significant improvement in segmentation performance across all unseen datasets. Our code is publicly available at https://github.com/DeweiHu/AdaptDiff.
Paper Structure (11 sections, 2 equations, 5 figures, 2 tables)

This paper contains 11 sections, 2 equations, 5 figures, 2 tables.

Figures (5)

  • Figure 1: Left: example patches for datasets in FP, FA and OCTA. The outlines color-code the modality (red: FA, yellow: OCT-A, blue: FP). Right: T-SNE plot to visualize the separation between domains in feature space (extracted by pre-trained VGG-16).
  • Figure 2: (Step 1) Train the segmentation model $f_{seg}$ on labeled source domain and test on target domain images to create pseudo-labels. (Step 2) Train a semantic conditional diffusion model $f_{syn}$ with $\{\mathbf{x}^{\mathcal{T}}, \hat{\mathbf{y}}\}$. (Step 3) Inference the synthetic model to generate target domain samples corresponding to the real labels $\mathbf{y}$. (Step 4) Fine-tune the segmentation model $f_{seg}$ on the target domain with $\{\hat{\mathbf{x}}^{\mathcal{T}}, \mathbf{y}\}$. Dashed/solid lines: model training/testing. Different marker shapes represent distinct anatomies. Solid shapes are real images and manual labels, outlines are synthetic images and pseudo-labels.
  • Figure 3: Weak conditional diffusion model. The semantic condition $\hat{\mathbf{y}}$ is added to the residual U-Net model by the spatial normalization block (SPADE)park2019semantic
  • Figure 4: Left, performance of the semantic diffusion model trained with polluted label as semantic conditions. Green: cases where the generated images are visually well-correlated with the label during testing. Gray: borderline cases where some vessels are missed during testing. Red: cases where the model fails. Right, highlighted examples from each performance level (black-outlined cells from left panel). Top row: the increasingly polluted labels of $\mathbf{y}_1$ during training. Bottom row: a test label $\mathbf{y}_2$ and the sampled images $f_{syn}(\mathbf{y}_2)$ during testing. The yellow arrows highlight missed vessels in $f^{\{0.4,0.2\}}_{syn}(\mathbf{y}_2)$. $f^{\{0.6,0.3\}}_{syn}(\mathbf{y}_2)$ does not correlate well with the input label $\mathbf{y}_2$.
  • Figure 5: Qualitative results. Top row, generated OCT-A images. Yellow arrows: preservation of a thin vessel. Red arrows: accuracy of vessel thickness. Green box: hallucinated vessels. Bottom row, generated FA images. Yellow boxes highlight that CycleGAN, CUT and SynSeg are not able to capture the vessels due to the poor contrast in $\mathbf{x}^{\mathcal{S}}$. AdaptDiff-synthesized images have the correct anatomy in each of these cases.