Diffusion Adversarial Representation Learning for Self-supervised Vessel Segmentation

Boah Kim; Yujin Oh; Jong Chul Ye

Diffusion Adversarial Representation Learning for Self-supervised Vessel Segmentation

Boah Kim, Yujin Oh, Jong Chul Ye

TL;DR

This work tackles vessel segmentation without labeled data by introducing DARL, a non-iterative diffusion-adversarial framework that jointly learns background signals via a diffusion module and vessel representations via a generation module equipped with switchable SPADE layers. The model uses diffusion loss to model backgrounds, adversarial losses to produce realistic vessel masks and angiograms, and a cycle-consistency loss to enforce semantic alignment with fractal vessel masks, enabling robust, one-step segmentation. DARL demonstrates state-of-the-art performance among unsupervised/self-supervised methods across coronary angiography and cross-domain retinal datasets, with strong noise robustness and generalization to unseen imaging modalities. The approach offers fast inference, improved vessel localization, and a reusable framework for general vascular segmentation without heavy labeling requirements.

Abstract

Vessel segmentation in medical images is one of the important tasks in the diagnosis of vascular diseases and therapy planning. Although learning-based segmentation approaches have been extensively studied, a large amount of ground-truth labels are required in supervised methods and confusing background structures make neural networks hard to segment vessels in an unsupervised manner. To address this, here we introduce a novel diffusion adversarial representation learning (DARL) model that leverages a denoising diffusion probabilistic model with adversarial learning, and apply it to vessel segmentation. In particular, for self-supervised vessel segmentation, DARL learns the background signal using a diffusion module, which lets a generation module effectively provide vessel representations. Also, by adversarial learning based on the proposed switchable spatially-adaptive denormalization, our model estimates synthetic fake vessel images as well as vessel segmentation masks, which further makes the model capture vessel-relevant semantic information. Once the proposed model is trained, the model generates segmentation masks in a single step and can be applied to general vascular structure segmentation of coronary angiography and retinal images. Experimental results on various datasets show that our method significantly outperforms existing unsupervised and self-supervised vessel segmentation methods.

Diffusion Adversarial Representation Learning for Self-supervised Vessel Segmentation

TL;DR

Abstract

Paper Structure (43 sections, 15 equations, 13 figures, 12 tables)

This paper contains 43 sections, 15 equations, 13 figures, 12 tables.

Introduction
Backgrounds and related works
Denoising diffusion probabilistic model
Self-supervised vessel segmentation
Diffusion adversarial representation learning
Generation module with switchable SPADE layers
Network Training
Loss function
Diffusion loss
Adversarial loss
Cyclic reconstruction loss
Image perturbation for the model input
Inference of vessel segmentation
Experiments
Datasets
...and 28 more sections

Figures (13)

Figure 1: Our proposed diffusion adversarial representation model for self-supervised vessel segmentation. In path (A), given a real noisy angiography image ${\boldsymbol x}^{a}_{t_a}$, our model estimates vessel segmentation masks $\hat{{\boldsymbol s}}^{v}$. In path (B), given a real noisy background image ${\boldsymbol x}^{b}_{t_b}$ and a vessel-like fractal mask ${\boldsymbol s}^{f}$, our model generates a synthetic angiography image $\hat{{\boldsymbol x}}^{a}$.
Figure 2: Training flow of our model. The generation module $G$ with the switchable SPADE layers takes $\bm\epsilon_{\bm\theta}$ and the noisy images, and generates desired outputs corresponding to the paths. ${{\boldsymbol x}}^{a}_{t_{a}}$ and ${{\boldsymbol x}}^{b}_{t_{b}}$ denote the noisy angiography and background images, where $t_a$ and $t_b$ are noise schedules. $\hat{{\boldsymbol s}}^{v}$ is the generated vessel segmentation, and $\hat{{\boldsymbol x}}^{a}$ is the synthetic angiography images. ${\boldsymbol s}^f$ is the vessel-like fractal masks. $Cat$ denotes the concatenation of images in channel dimension. $\mathcal{L}_{diff}$, $\mathcal{L}_{adv}$, and $\mathcal{L}_{cyc}$ are the diffusion loss, the adversarial loss, and the cycle loss, respectively.
Figure 3: Vessel segmentation according to the noise level $t_a$. Our model estimates the segmentation masks $\hat{{\boldsymbol s}}^{v}$ using the latent features $\bm\epsilon_{\bm \theta}$ for the noisy angiograms ${\boldsymbol x}_{t_a}^a$. ${\boldsymbol s}^{v}$ is the ground-truth label.
Figure 4: Visual comparison results on the vessel segmentation of various angiography images.
Figure 5: Estimated latent features $\bm\epsilon_{\bm\theta}$ in the (A) and (B) paths of our model.
...and 8 more figures

Diffusion Adversarial Representation Learning for Self-supervised Vessel Segmentation

TL;DR

Abstract

Diffusion Adversarial Representation Learning for Self-supervised Vessel Segmentation

Authors

TL;DR

Abstract

Table of Contents

Figures (13)