LPUWF-LDM: Enhanced Latent Diffusion Model for Precise Late-phase UWF-FA Generation on Limited Dataset

Zhaojie Fang; Xiao Yu; Guanyu Zhou; Ke Zhuang; Yifei Chen; Ruiquan Ge; Changmiao Wang; Gangyong Jia; Qing Wu; Juan Ye; Maimaiti Nuliqiman; Peifang Xu; Ahmed Elazab

LPUWF-LDM: Enhanced Latent Diffusion Model for Precise Late-phase UWF-FA Generation on Limited Dataset

Zhaojie Fang, Xiao Yu, Guanyu Zhou, Ke Zhuang, Yifei Chen, Ruiquan Ge, Changmiao Wang, Gangyong Jia, Qing Wu, Juan Ye, Maimaiti Nuliqiman, Peifang Xu, Ahmed Elazab

TL;DR

The paper addresses the challenge of noninvasively generating high-fidelity late-phase UWF-FA images from UWF-SLO under limited paired data. It introduces LPUWF-LDM, a latent diffusion framework augmented with a Gated Convolutional Encoder for the VAE, a Cross-temporal Regional Difference Loss to emphasize lesion regions, and a Low-Frequency Enhanced Noise strategy to improve realism in ophthalmic images. Through two-stage training on multicenter data and comprehensive ablations, the method achieves state-of-the-art or competitive performance on a proprietary UWF dataset, significantly improving metrics such as $FID$ while preserving or enhancing $IS$, $PSNR$, and $MS$-$SSIM$. The work advances noninvasive retinal diagnostics by enabling accurate late-phase synthesis with limited data, potentially reducing the need for dye injections and enabling broader clinical deployment.

Abstract

Ultra-Wide-Field Fluorescein Angiography (UWF-FA) enables precise identification of ocular diseases using sodium fluorescein, which can be potentially harmful. Existing research has developed methods to generate UWF-FA from Ultra-Wide-Field Scanning Laser Ophthalmoscopy (UWF-SLO) to reduce the adverse reactions associated with injections. However, these methods have been less effective in producing high-quality late-phase UWF-FA, particularly in lesion areas and fine details. Two primary challenges hinder the generation of high-quality late-phase UWF-FA: the scarcity of paired UWF-SLO and early/late-phase UWF-FA datasets, and the need for realistic generation at lesion sites and potential blood leakage regions. This study introduces an improved latent diffusion model framework to generate high-quality late-phase UWF-FA from limited paired UWF images. To address the challenges as mentioned earlier, our approach employs a module utilizing Cross-temporal Regional Difference Loss, which encourages the model to focus on the differences between early and late phases. Additionally, we introduce a low-frequency enhanced noise strategy in the diffusion forward process to improve the realism of medical images. To further enhance the mapping capability of the variational autoencoder module, especially with limited datasets, we implement a Gated Convolutional Encoder to extract additional information from conditional images. Our Latent Diffusion Model for Ultra-Wide-Field Late-Phase Fluorescein Angiography (LPUWF-LDM) effectively reconstructs fine details in late-phase UWF-FA and achieves state-of-the-art results compared to other existing methods when working with limited datasets. Our source code is available at: https://github.com/Tinysqua/****.

LPUWF-LDM: Enhanced Latent Diffusion Model for Precise Late-phase UWF-FA Generation on Limited Dataset

TL;DR

while preserving or enhancing

, and

. The work advances noninvasive retinal diagnostics by enabling accurate late-phase synthesis with limited data, potentially reducing the need for dye injections and enabling broader clinical deployment.

Abstract

Paper Structure (16 sections, 6 equations, 7 figures, 2 tables)

This paper contains 16 sections, 6 equations, 7 figures, 2 tables.

Introduction
Related Work
Cross-modal Generation
Diffusion Models combined with VAE
Methods
Gated Convolutional Encoder For VAE
Cross-temporal Reginal Difference Loss
Low Frequency Enhanced Noise
Preprocessing of UWF photos
Experiments
Dataset
Evaluation Metrics
Implementation Details
Comparison
Ablation Studies
...and 1 more sections

Figures (7)

Figure 1: Overall architecture of LPUWF-LDM. It encompasses a VAE module with a Gated Convolutional Encoder, a noise addition module utilizing low-frequency enhanced noise, a conditional encoder module for input conditional images, and a backbone for noise prediction trained via CTRD Loss.
Figure 2: Details of the Gated Convolutional Encoder for the VAE framework. A comparison between the traditional VAE approach and our VAE method. Our method augments the original VAE framework with a Gated Convolutional Encoder, which consists of a Downsample Module and a Gated Module.
Figure 3: Two pairs of early-phase and late-phase UWF-FA images and their Cross-temporal Regional Differences. The left column shows early-phase UWF-FA, the middle column displays late-phase UWF-FA, and the rightmost column presents the Cross-temporal Differences.
Figure 4: High and low-frequency division of noised UWF-FA using fourier transform. The top row shows the visualization of high and low frequencies with Low-Frequency Enhanced Noise applied, while the bottom row displays those without it.
Figure 5: The Effects of UWF photos preprocessing. The top left corner demonstrates the effect of image sharpening, and the bottom right corner demonstrates the process of UWF-FA registration.
...and 2 more figures

LPUWF-LDM: Enhanced Latent Diffusion Model for Precise Late-phase UWF-FA Generation on Limited Dataset

TL;DR

Abstract

LPUWF-LDM: Enhanced Latent Diffusion Model for Precise Late-phase UWF-FA Generation on Limited Dataset

Authors

TL;DR

Abstract

Table of Contents

Figures (7)