Table of Contents
Fetching ...

FundusGAN: A Hierarchical Feature-Aware Generative Framework for High-Fidelity Fundus Image Generation

Qingshan Hou, Meng Wang, Peng Cao, Zou Ke, Xiaoli Liu, Huazhu Fu, Osmar R. Zaiane

TL;DR

FundusGAN tackles data scarcity in ophthalmology foundation models by introducing a hierarchical, feature-aware generation framework. It uses a Feature Pyramid Network–enhanced encoder to capture multi-scale retinal structures and a StyleGAN-inspired generator with dilated convolutions, guided by a latent content vector mapping and a delta-controlled skip mechanism. The method pairs regularization and perceptual/pixel-wise losses to produce anatomically faithful and lesion-rich fundus images, achieving state-of-the-art SSIM and lower FID/KID on DDR, DRIVE, and IDRiD, and enabling improved disease classification when synthetic data are added to training. Overall, FundusGAN demonstrates strong potential as a foundation model component to alleviate data requirements in ophthalmology and to support robust, data-efficient AI-assisted diagnostics.

Abstract

Recent advancements in ophthalmology foundation models such as RetFound have demonstrated remarkable diagnostic capabilities but require massive datasets for effective pre-training, creating significant barriers for development and deployment. To address this critical challenge, we propose FundusGAN, a novel hierarchical feature-aware generative framework specifically designed for high-fidelity fundus image synthesis. Our approach leverages a Feature Pyramid Network within its encoder to comprehensively extract multi-scale information, capturing both large anatomical structures and subtle pathological features. The framework incorporates a modified StyleGAN-based generator with dilated convolutions and strategic upsampling adjustments to preserve critical retinal structures while enhancing pathological detail representation. Comprehensive evaluations on the DDR, DRIVE, and IDRiD datasets demonstrate that FundusGAN consistently outperforms state-of-the-art methods across multiple metrics (SSIM: 0.8863, FID: 54.2, KID: 0.0436 on DDR). Furthermore, disease classification experiments reveal that augmenting training data with FundusGAN-generated images significantly improves diagnostic accuracy across multiple CNN architectures (up to 6.49\% improvement with ResNet50). These results establish FundusGAN as a valuable foundation model component that effectively addresses data scarcity challenges in ophthalmological AI research, enabling more robust and generalizable diagnostic systems while reducing dependency on large-scale clinical data collection.

FundusGAN: A Hierarchical Feature-Aware Generative Framework for High-Fidelity Fundus Image Generation

TL;DR

FundusGAN tackles data scarcity in ophthalmology foundation models by introducing a hierarchical, feature-aware generation framework. It uses a Feature Pyramid Network–enhanced encoder to capture multi-scale retinal structures and a StyleGAN-inspired generator with dilated convolutions, guided by a latent content vector mapping and a delta-controlled skip mechanism. The method pairs regularization and perceptual/pixel-wise losses to produce anatomically faithful and lesion-rich fundus images, achieving state-of-the-art SSIM and lower FID/KID on DDR, DRIVE, and IDRiD, and enabling improved disease classification when synthetic data are added to training. Overall, FundusGAN demonstrates strong potential as a foundation model component to alleviate data requirements in ophthalmology and to support robust, data-efficient AI-assisted diagnostics.

Abstract

Recent advancements in ophthalmology foundation models such as RetFound have demonstrated remarkable diagnostic capabilities but require massive datasets for effective pre-training, creating significant barriers for development and deployment. To address this critical challenge, we propose FundusGAN, a novel hierarchical feature-aware generative framework specifically designed for high-fidelity fundus image synthesis. Our approach leverages a Feature Pyramid Network within its encoder to comprehensively extract multi-scale information, capturing both large anatomical structures and subtle pathological features. The framework incorporates a modified StyleGAN-based generator with dilated convolutions and strategic upsampling adjustments to preserve critical retinal structures while enhancing pathological detail representation. Comprehensive evaluations on the DDR, DRIVE, and IDRiD datasets demonstrate that FundusGAN consistently outperforms state-of-the-art methods across multiple metrics (SSIM: 0.8863, FID: 54.2, KID: 0.0436 on DDR). Furthermore, disease classification experiments reveal that augmenting training data with FundusGAN-generated images significantly improves diagnostic accuracy across multiple CNN architectures (up to 6.49\% improvement with ResNet50). These results establish FundusGAN as a valuable foundation model component that effectively addresses data scarcity challenges in ophthalmological AI research, enabling more robust and generalizable diagnostic systems while reducing dependency on large-scale clinical data collection.

Paper Structure

This paper contains 17 sections, 9 equations, 2 figures, 3 tables.

Figures (2)

  • Figure 1: Architecture overview of fundusGAN. The framework comprises two core components: (1) A hierarchical encoder that processes input fundus images to extract multi-scale feature maps (low-, mid-, and high-level), and (2) A generator that generates output images through latent content vector space $w^+$ and first layer feature.
  • Figure 2: Color fundus retinal images generated by FundusGAN.