Table of Contents
Fetching ...

Fundus2Video: Cross-Modal Angiography Video Generation from Static Fundus Photography with Clinical Knowledge Guidance

Weiyi Zhang, Siyu Huang, Jiancheng Yang, Ruoyu Chen, Zongyuan Ge, Yingfeng Zheng, Danli Shi, Mingguang He

TL;DR

The paper tackles the challenge of generating dynamic FFA videos from static CF fundus images, addressing the invasiveness and limited accessibility of FFA. It introduces Fundus2Video, an autoregressive GAN based on pix2pixHD that syntheses frames from a CF image with memory-efficient, frame-by-frame generation, guided by an unsupervised knowledge mask derived from early and late FFA frames. The KM-guided components—knowledge-boosted attention, mask-enhanced PatchNCE losses, and knowledge-aware discriminators—improve fidelity in dynamic lesion and vascular regions while mitigating pixel misalignment; the knowledge mask itself outperforms supervised lesion masks in non-invasive learning. Experimental results show superior FVD and PSNR, corroborated by ophthalmologist evaluations, and the approach offers a practical non-invasive cross-modal angiography pathway for research and clinical use.

Abstract

Fundus Fluorescein Angiography (FFA) is a critical tool for assessing retinal vascular dynamics and aiding in the diagnosis of eye diseases. However, its invasive nature and less accessibility compared to Color Fundus (CF) images pose significant challenges. Current CF to FFA translation methods are limited to static generation. In this work, we pioneer dynamic FFA video generation from static CF images. We introduce an autoregressive GAN for smooth, memory-saving frame-by-frame FFA synthesis. To enhance the focus on dynamic lesion changes in FFA regions, we design a knowledge mask based on clinical experience. Leveraging this mask, our approach integrates innovative knowledge mask-guided techniques, including knowledge-boosted attention, knowledge-aware discriminators, and mask-enhanced patchNCE loss, aimed at refining generation in critical areas and addressing the pixel misalignment challenge. Our method achieves the best FVD of 1503.21 and PSNR of 11.81 compared to other common video generation approaches. Human assessment by an ophthalmologist confirms its high generation quality. Notably, our knowledge mask surpasses supervised lesion segmentation masks, offering a promising non-invasive alternative to traditional FFA for research and clinical applications. The code is available at https://github.com/Michi-3000/Fundus2Video.

Fundus2Video: Cross-Modal Angiography Video Generation from Static Fundus Photography with Clinical Knowledge Guidance

TL;DR

The paper tackles the challenge of generating dynamic FFA videos from static CF fundus images, addressing the invasiveness and limited accessibility of FFA. It introduces Fundus2Video, an autoregressive GAN based on pix2pixHD that syntheses frames from a CF image with memory-efficient, frame-by-frame generation, guided by an unsupervised knowledge mask derived from early and late FFA frames. The KM-guided components—knowledge-boosted attention, mask-enhanced PatchNCE losses, and knowledge-aware discriminators—improve fidelity in dynamic lesion and vascular regions while mitigating pixel misalignment; the knowledge mask itself outperforms supervised lesion masks in non-invasive learning. Experimental results show superior FVD and PSNR, corroborated by ophthalmologist evaluations, and the approach offers a practical non-invasive cross-modal angiography pathway for research and clinical use.

Abstract

Fundus Fluorescein Angiography (FFA) is a critical tool for assessing retinal vascular dynamics and aiding in the diagnosis of eye diseases. However, its invasive nature and less accessibility compared to Color Fundus (CF) images pose significant challenges. Current CF to FFA translation methods are limited to static generation. In this work, we pioneer dynamic FFA video generation from static CF images. We introduce an autoregressive GAN for smooth, memory-saving frame-by-frame FFA synthesis. To enhance the focus on dynamic lesion changes in FFA regions, we design a knowledge mask based on clinical experience. Leveraging this mask, our approach integrates innovative knowledge mask-guided techniques, including knowledge-boosted attention, knowledge-aware discriminators, and mask-enhanced patchNCE loss, aimed at refining generation in critical areas and addressing the pixel misalignment challenge. Our method achieves the best FVD of 1503.21 and PSNR of 11.81 compared to other common video generation approaches. Human assessment by an ophthalmologist confirms its high generation quality. Notably, our knowledge mask surpasses supervised lesion segmentation masks, offering a promising non-invasive alternative to traditional FFA for research and clinical applications. The code is available at https://github.com/Michi-3000/Fundus2Video.
Paper Structure (18 sections, 7 equations, 3 figures, 1 table)

This paper contains 18 sections, 7 equations, 3 figures, 1 table.

Figures (3)

  • Figure 1: Proposed Fundus2Video. (a) The overall architecture. Generator $G$ generates one frame at a time, taking the output from the previous time step and the CF image as input. During the training phase, unsupervised knowledge masks guide the entire network. (b) The design of the mask-enhanced patchNCE loss.
  • Figure 2: The definition of the knowledge mask. Left: The unsupervised process of obtaining the mask. The knowledge mask covers the same pathological areas as the expert-labeled mask. Right: Generated results with and without the knowledge mask.
  • Figure 3: Qualitative comparison of the methods. Frames are sampled from the 12-frame video. Areas in red boxes denote significant lesions. It can be observed that the KM-guided Fundus2Video exhibits best performance in generating critical lesions.