LPUWF-LDM: Enhanced Latent Diffusion Model for Precise Late-phase UWF-FA Generation on Limited Dataset
Zhaojie Fang, Xiao Yu, Guanyu Zhou, Ke Zhuang, Yifei Chen, Ruiquan Ge, Changmiao Wang, Gangyong Jia, Qing Wu, Juan Ye, Maimaiti Nuliqiman, Peifang Xu, Ahmed Elazab
TL;DR
The paper addresses the challenge of noninvasively generating high-fidelity late-phase UWF-FA images from UWF-SLO under limited paired data. It introduces LPUWF-LDM, a latent diffusion framework augmented with a Gated Convolutional Encoder for the VAE, a Cross-temporal Regional Difference Loss to emphasize lesion regions, and a Low-Frequency Enhanced Noise strategy to improve realism in ophthalmic images. Through two-stage training on multicenter data and comprehensive ablations, the method achieves state-of-the-art or competitive performance on a proprietary UWF dataset, significantly improving metrics such as $FID$ while preserving or enhancing $IS$, $PSNR$, and $MS$-$SSIM$. The work advances noninvasive retinal diagnostics by enabling accurate late-phase synthesis with limited data, potentially reducing the need for dye injections and enabling broader clinical deployment.
Abstract
Ultra-Wide-Field Fluorescein Angiography (UWF-FA) enables precise identification of ocular diseases using sodium fluorescein, which can be potentially harmful. Existing research has developed methods to generate UWF-FA from Ultra-Wide-Field Scanning Laser Ophthalmoscopy (UWF-SLO) to reduce the adverse reactions associated with injections. However, these methods have been less effective in producing high-quality late-phase UWF-FA, particularly in lesion areas and fine details. Two primary challenges hinder the generation of high-quality late-phase UWF-FA: the scarcity of paired UWF-SLO and early/late-phase UWF-FA datasets, and the need for realistic generation at lesion sites and potential blood leakage regions. This study introduces an improved latent diffusion model framework to generate high-quality late-phase UWF-FA from limited paired UWF images. To address the challenges as mentioned earlier, our approach employs a module utilizing Cross-temporal Regional Difference Loss, which encourages the model to focus on the differences between early and late phases. Additionally, we introduce a low-frequency enhanced noise strategy in the diffusion forward process to improve the realism of medical images. To further enhance the mapping capability of the variational autoencoder module, especially with limited datasets, we implement a Gated Convolutional Encoder to extract additional information from conditional images. Our Latent Diffusion Model for Ultra-Wide-Field Late-Phase Fluorescein Angiography (LPUWF-LDM) effectively reconstructs fine details in late-phase UWF-FA and achieves state-of-the-art results compared to other existing methods when working with limited datasets. Our source code is available at: https://github.com/Tinysqua/****.
