Table of Contents
Fetching ...

VAOT: Vessel-Aware Optimal Transport for Retinal Fundus Enhancement

Xuanzhao Dong, Wenhui Zhu, Yujian Xiong, Xiwen Chen, Hao Wang, Xin Li, Jiajun Cheng, Zhipeng Wang, Shao Tang, Oana Dumitrascu, Yalin Wang

TL;DR

Retinal CFP quality varies with acquisition factors, and unpaired enhancement methods often distort vasculature, compromising clinical utility. The authors propose Vessel-Aware Optimal Transport (VAOT), which combines an optimal-transport backbone with two structure-preserving regularizers: a skeleton-guided global morphology alignment and an endpoint-aware local integrity term, enabled by differentiable soft-skeletonization and endpoint-centered local windows. Training occurs in two phases: Phase 1 optimizes the OT objective to approximate the transport map $f^*$, and Phase 2 adds the skeleton and endpoint regularizers to refine global topology and local structure while preserving denoising performance. Empirical results on EyeQ and cross-dataset tests (IDRiD and DRIVE) show VAOT achieves superior denoising metrics and better preservation of vascular topology, improving downstream vessel and lesion segmentation; the method is open-sourced at the provided GitHub repository.

Abstract

Color fundus photography (CFP) is central to diagnosing and monitoring retinal disease, yet its acquisition variability (e.g., illumination changes) often degrades image quality, which motivates robust enhancement methods. Unpaired enhancement pipelines are typically GAN-based, however, they can distort clinically critical vasculature, altering vessel topology and endpoint integrity. Motivated by these structural alterations, we propose Vessel-Aware Optimal Transport (\textbf{VAOT}), a framework that combines an optimal-transport objective with two structure-preserving regularizers: (i) a skeleton-based loss to maintain global vascular connectivity and (ii) an endpoint-aware loss to stabilize local termini. These constraints guide learning in the unpaired setting, reducing noise while preserving vessel structure. Experimental results on synthetic degradation benchmark and downstream evaluations in vessel and lesion segmentation demonstrate the superiority of the proposed methods against several state-of-the art baselines. The code is available at https://github.com/Retinal-Research/VAOT

VAOT: Vessel-Aware Optimal Transport for Retinal Fundus Enhancement

TL;DR

Retinal CFP quality varies with acquisition factors, and unpaired enhancement methods often distort vasculature, compromising clinical utility. The authors propose Vessel-Aware Optimal Transport (VAOT), which combines an optimal-transport backbone with two structure-preserving regularizers: a skeleton-guided global morphology alignment and an endpoint-aware local integrity term, enabled by differentiable soft-skeletonization and endpoint-centered local windows. Training occurs in two phases: Phase 1 optimizes the OT objective to approximate the transport map , and Phase 2 adds the skeleton and endpoint regularizers to refine global topology and local structure while preserving denoising performance. Empirical results on EyeQ and cross-dataset tests (IDRiD and DRIVE) show VAOT achieves superior denoising metrics and better preservation of vascular topology, improving downstream vessel and lesion segmentation; the method is open-sourced at the provided GitHub repository.

Abstract

Color fundus photography (CFP) is central to diagnosing and monitoring retinal disease, yet its acquisition variability (e.g., illumination changes) often degrades image quality, which motivates robust enhancement methods. Unpaired enhancement pipelines are typically GAN-based, however, they can distort clinically critical vasculature, altering vessel topology and endpoint integrity. Motivated by these structural alterations, we propose Vessel-Aware Optimal Transport (\textbf{VAOT}), a framework that combines an optimal-transport objective with two structure-preserving regularizers: (i) a skeleton-based loss to maintain global vascular connectivity and (ii) an endpoint-aware loss to stabilize local termini. These constraints guide learning in the unpaired setting, reducing noise while preserving vessel structure. Experimental results on synthetic degradation benchmark and downstream evaluations in vessel and lesion segmentation demonstrate the superiority of the proposed methods against several state-of-the art baselines. The code is available at https://github.com/Retinal-Research/VAOT

Paper Structure

This paper contains 15 sections, 6 equations, 8 figures, 3 tables.

Figures (8)

  • Figure 1: Example (A) and (B) illustration the segmentation mask and corresponding skeleton map (e.g., Skeleton($\mathbf{x}_1$)) for input low quality images (e.g., $\mathbf{x}_1$) and its enhanced counterparts (e.g., $\hat{\mathbf{x}}_1$). The skeleton structure shifts and endpoint modifications are highlight in yellow and red box, respectively. Here, the freezed generator $G_\theta$ comes from zhu2023optimal considering it's consistent great performance shown in zhu2025eyebench, and segmentation network comes from zhou2021study. Algorithms outlined in van2014scikit are used to extract skeleton based on segmentation map. See Sec. \ref{['sec:preliminary']} for more detailed analysis.
  • Figure 2: Illustration of the SGA modules. Specifically, given low quality image $\mathbf{x}$ and its enhanced counterpart $\hat{\mathbf{x}}=G_\theta(\mathbf{x})$, we use their soft segmentation maps (i.e., sigmoid outputs) and the corresponding soft skeletons to regularize global shape and connectivity. The red box highlights regions where noise affects the results, and the green box illustrates the order of skeletonization sequence. As the step index $i$ increases, centerlines of progressively thicker vessels are extracted. See Sec. \ref{['subsec:sga']} for details.
  • Figure 3: Illustration of the EVP module. Red circles mark vessel endpoints derived from the vessel skeleton, yellow boxes show the endpoint-centered windows in image space, and green dashed arrows indicate their correspondences. The local morphology regularization $C_e$ is computed over these batches of small, endpoint-centered windows. We denote this extraction-and-windowing operator by $En(\cdot)$. For simplicity, we overload the skeletonization notation $SK (\cdot)$ to denote a hard, binary skeleton. See Sec. \ref{['subsec:endpoint']} for details.
  • Figure 4: Illustration of the VAOT pipeline. The operator $SK(\cdot)$ denotes skeletonization, which is soft in Phase 1 and hard in Phase 2. Red circles mark detected endpoints and yellow arrows indicate the corresponding endpoint-centered local windows in image space. See Sec. \ref{['subsec:final-target']} for details.
  • Figure 5: Illustration of main enhancement task over EyeQ. The first two columns represent low-high quality image pairs. Column (A) illustrates results from paired algorithms, column (B) shows results from unpaired algorithms, and column (C) shows our results (i.e., VAOT). See Sec. \ref{['subsec:result']} for details.
  • ...and 3 more figures