Table of Contents
Fetching ...

Fundus image enhancement through direct diffusion bridges

Sehui Kim, Hyungjin Chung, Se Hie Park, Eui-Sang Chung, Kayoung Yi, Jong Chul Ye

TL;DR

FD3 tackles the challenging problem of restoring degraded fundus images by introducing a direct diffusion bridge that starts from the observed measurement and smoothly transitions toward a high-quality pseudo-ground-truth. It couples a CLAHE-aware forward model with a stand-alone diffusion network trained to predict posterior means across a continuum of timesteps, enabling end-to-end enhancement without pre-trained refiners. Extensive simulations and in-vivo studies with ophthalmologists demonstrate superior perceptual and quantitative performance compared with multiple baselines, including diffusion-guided refiners. The approach reduces computational load (low NFE) and provides a practical, open-source framework for advancing retinal diagnostics and downstream tasks like vessel segmentation.

Abstract

We propose FD3, a fundus image enhancement method based on direct diffusion bridges, which can cope with a wide range of complex degradations, including haze, blur, noise, and shadow. We first propose a synthetic forward model through a human feedback loop with board-certified ophthalmologists for maximal quality improvement of low-quality in-vivo images. Using the proposed forward model, we train a robust and flexible diffusion-based image enhancement network that is highly effective as a stand-alone method, unlike previous diffusion model-based approaches which act only as a refiner on top of pre-trained models. Through extensive experiments, we show that FD3 establishes \add{superior quality} not only on synthetic degradations but also on in vivo studies with low-quality fundus photos taken from patients with cataracts or small pupils. To promote further research in this area, we open-source all our code and data used for this research at https://github.com/heeheee888/FD3

Fundus image enhancement through direct diffusion bridges

TL;DR

FD3 tackles the challenging problem of restoring degraded fundus images by introducing a direct diffusion bridge that starts from the observed measurement and smoothly transitions toward a high-quality pseudo-ground-truth. It couples a CLAHE-aware forward model with a stand-alone diffusion network trained to predict posterior means across a continuum of timesteps, enabling end-to-end enhancement without pre-trained refiners. Extensive simulations and in-vivo studies with ophthalmologists demonstrate superior perceptual and quantitative performance compared with multiple baselines, including diffusion-guided refiners. The approach reduces computational load (low NFE) and provides a practical, open-source framework for advancing retinal diagnostics and downstream tasks like vessel segmentation.

Abstract

We propose FD3, a fundus image enhancement method based on direct diffusion bridges, which can cope with a wide range of complex degradations, including haze, blur, noise, and shadow. We first propose a synthetic forward model through a human feedback loop with board-certified ophthalmologists for maximal quality improvement of low-quality in-vivo images. Using the proposed forward model, we train a robust and flexible diffusion-based image enhancement network that is highly effective as a stand-alone method, unlike previous diffusion model-based approaches which act only as a refiner on top of pre-trained models. Through extensive experiments, we show that FD3 establishes \add{superior quality} not only on synthetic degradations but also on in vivo studies with low-quality fundus photos taken from patients with cataracts or small pupils. To promote further research in this area, we open-source all our code and data used for this research at https://github.com/heeheee888/FD3
Paper Structure (18 sections, 22 equations, 10 figures, 4 tables)

This paper contains 18 sections, 22 equations, 10 figures, 4 tables.

Figures (10)

  • Figure 1: (a) Training of FD3. CLAHE-applied high-quality images $\mathcal{C}(\bm{x}_0)$ are used as pseudo-ground-truth. $\bm{x}_t$ are randomly sampled to be convex combinations between $\mathcal{C}(\bm{x}_0)$ and the measurement $\bm{y}$. The neural network $F_\theta$ is trained to map any $\bm{x}_t$ to be close to $\mathcal{C}(\bm{x}_0)$. (b) Inference (sampling) of FD3. Trained neural network $F_\theta$ refines the posterior mean by following \ref{['eq:inference_fd3']}, and directly starting from $\bm{y} = \bm{x}_1$. At every timestep, an approximate posterior mean ${{\mathbb E}}[\bm{x}_0|\bm{x}_t]$ is produced as a direct output of the neural network $F_\theta$.
  • Figure 2: (a) Before, (b) After applying CLAHE to "ground-truth" images. 1$^{\rm st}$ row: drusen marked with yellow arrows. 2$^{\rm nd}$ row: hemorrhage and microaneurysm marked with yellow arrows.
  • Figure 3: Schematic illustration of (a) standard conditional diffusion, and (b) FD3. FD3 is capable of following a direct and smoother trajectory from $p(\bm{y})$ to $p(\bm{x}_0)$, compared to the standard diffusion path, which involves starting the process from irrelevant Gaussian noise.
  • Figure 4: (Simulation study) Comparison of the image enhancement quality using our proposed forward model. From 1st column to 3rd column: EyeQ dataset, 4th column: FPE dataset, CycleGAN zhu2017unpaired, PCE-Net Liu_2022, BlindDPS chung2023parallel, LED cheng2023learning, FD3 (Ours), and ground truth. Yellow numbers in the top left corner: PSNR.
  • Figure 5: Downstream vessel segmentation performance evaluation using a pre-trained model Iter-Net 9093621. Yellow numbers on the bottom left corner: IOU.
  • ...and 5 more figures