Table of Contents
Fetching ...

Enhancing Retinal Vessel Segmentation Generalization via Layout-Aware Generative Modelling

Jonathan Fhima, Jan Van Eijgen, Lennert Beeckmans, Thomas Jacobs, Moti Freiman, Luis Filipe Nakayama, Ingeborg Stalmans, Chaim Baskin, Joachim A. Behar

TL;DR

This work tackles generalization gaps in retinal vessel segmentation due to limited annotated data by introducing RLAD, a diffusion-based framework that generates layout-conditioned retinal fundus images. RLAD operates in a frozen VAE latent space and conditions generation on multiple retinal structures (AV, CD, L) extracted from real images, enabling realistic, controllable data augmentation for AV segmentation. The approach yields robustness gains across backbones (up to 8.1% in reported metrics) and introduces REYIA, a large AV-segmented dataset of 586 images, with code and data to support reproducibility. Overall, RLAD demonstrates that targeted synthetic data can substantially improve segmentation generalization and paves the way for extending layout-aware diffusion to other medical imaging tasks.

Abstract

Generalization in medical segmentation models is challenging due to limited annotated datasets and imaging variability. To address this, we propose Retinal Layout-Aware Diffusion (RLAD), a novel diffusion-based framework for generating controllable layout-aware images. RLAD conditions image generation on multiple key layout components extracted from real images, ensuring high structural fidelity while enabling diversity in other components. Applied to retinal fundus imaging, we augmented the training datasets by synthesizing paired retinal images and vessel segmentations conditioned on extracted blood vessels from real images, while varying other layout components such as lesions and the optic disc. Experiments demonstrated that RLAD-generated data improved generalization in retinal vessel segmentation by up to 8.1%. Furthermore, we present REYIA, a comprehensive dataset comprising 586 manually segmented retinal images. To foster reproducibility and drive innovation, both our code and dataset will be made publicly accessible.

Enhancing Retinal Vessel Segmentation Generalization via Layout-Aware Generative Modelling

TL;DR

This work tackles generalization gaps in retinal vessel segmentation due to limited annotated data by introducing RLAD, a diffusion-based framework that generates layout-conditioned retinal fundus images. RLAD operates in a frozen VAE latent space and conditions generation on multiple retinal structures (AV, CD, L) extracted from real images, enabling realistic, controllable data augmentation for AV segmentation. The approach yields robustness gains across backbones (up to 8.1% in reported metrics) and introduces REYIA, a large AV-segmented dataset of 586 images, with code and data to support reproducibility. Overall, RLAD demonstrates that targeted synthetic data can substantially improve segmentation generalization and paves the way for extending layout-aware diffusion to other medical imaging tasks.

Abstract

Generalization in medical segmentation models is challenging due to limited annotated datasets and imaging variability. To address this, we propose Retinal Layout-Aware Diffusion (RLAD), a novel diffusion-based framework for generating controllable layout-aware images. RLAD conditions image generation on multiple key layout components extracted from real images, ensuring high structural fidelity while enabling diversity in other components. Applied to retinal fundus imaging, we augmented the training datasets by synthesizing paired retinal images and vessel segmentations conditioned on extracted blood vessels from real images, while varying other layout components such as lesions and the optic disc. Experiments demonstrated that RLAD-generated data improved generalization in retinal vessel segmentation by up to 8.1%. Furthermore, we present REYIA, a comprehensive dataset comprising 586 manually segmented retinal images. To foster reproducibility and drive innovation, both our code and dataset will be made publicly accessible.

Paper Structure

This paper contains 30 sections, 10 equations, 7 figures, 13 tables.

Figures (7)

  • Figure 1: Retinal Layout-Aware Diffusion generates realistic retinal images from noise and user-defined layout component s; artery/vein (AV), optic cup/disc (CD), and lesions (L).
  • Figure 2: RLAD Architecture. The original fundus image and segmentation maps for artery/vein (AV), the optic cup/disc (CD), and lesions (L) are encoded into latent representations using a frozen VAE. Gaussian noise is added to the image latent, and each latent (image, CD, AV, and L) is projected into the DiT peebles2023scalable input space via distinct projections. Condition embeddings for AV, CD, and L are summed into a single embedding, $c$. The DiT input consists of a beginning-of-conditioning (BOC) token, user input (UI), $c$, an end-of-conditioning (EOC) token, and the noised image latent. The DiT outputs the corresponding denoised image latent. The UI token specifies whether a layout component is guided by user input or defaults to a neutral embedding when absent.
  • Figure 3: Retinal Layout-Aware Diffusion Qualitative Examples. Top: user-defined layout component s inputs (artery/vein in red/blue, optic disc/cup in green/yellow, and lesions in white/pink/orange). Bottom: corresponding generated fundus images.
  • Figure 4: Qualitative Example on the Segmentation Downstream Task. Comparing our model’s AV segmentation to a SwinV2$_\text{Large}$liu2021swin trained on the UZLF dataset and LUNet fhima2024lunet, a SOTA model, showcasing its superior performance across fundus images from various datasets.
  • Figure 5: RLAD Performance vs. Training Data Size. The figure illustrates the learning curve of the SwinV2$_{\text{tiny}}$liu2021swin baseline on OOD datasets, demonstrating enhanced performance with RLAD-generated data. The data percentage reflects both real and generated samples, maintaining a 1:15 ratio (real:generated).
  • ...and 2 more figures