Table of Contents
Fetching ...

Generative artificial intelligence in ophthalmology: multimodal retinal images for the diagnosis of Alzheimer's disease with convolutional neural networks

I. R. Slootweg, M. Thach, K. R. Curro-Tafili, F. D. Verbraak, F. H. Bouwman, Y. A. L. Pijnenburg, J. F. Boer, J. H. P. de Kwisthout, L. Bagheriye, P. J. González

TL;DR

This work addresses non-invasive Alzheimer's disease screening by predicting AmyloidPET status from multimodal retinal imaging using CNNs. It introduces a diffusion-based generative framework (DDPM) to synthesize four retinal modalities, a filter to ensure realism, and unimodal plus multimodal classifiers with optional metadata fusion. The key findings show that pretraining CNNs on synthetic data improves class-imbalanced precision-recall (AUPR) and that metadata fusion yields the best overall performance on a test set ($AUPR$ ≈ $0.634$, $AUROC$ ≈ $0.729$). The work demonstrates potential for synthetic-data-driven improvements in low-sample regimes and highlights interpretable retinal regions via GradCAM, suggesting a path toward cost-effective community AD screening.

Abstract

Background/Aim. This study aims to predict Amyloid Positron Emission Tomography (AmyloidPET) status with multimodal retinal imaging and convolutional neural networks (CNNs) and to improve the performance through pretraining with synthetic data. Methods. Fundus autofluorescence, optical coherence tomography (OCT), and OCT angiography images from 328 eyes of 59 AmyloidPET positive subjects and 108 AmyloidPET negative subjects were used for classification. Denoising Diffusion Probabilistic Models (DDPMs) were trained to generate synthetic images and unimodal CNNs were pretrained on synthetic data and finetuned on real data or trained solely on real data. Multimodal classifiers were developed to combine predictions of the four unimodal CNNs with patient metadata. Class activation maps of the unimodal classifiers provided insight into the network's attention to inputs. Results. DDPMs generated diverse, realistic images without memorization. Pretraining unimodal CNNs with synthetic data improved AUPR at most from 0.350 to 0.579. Integration of metadata in multimodal CNNs improved AUPR from 0.486 to 0.634, which was the best overall best classifier. Class activation maps highlighted relevant retinal regions which correlated with AD. Conclusion. Our method for generating and leveraging synthetic data has the potential to improve AmyloidPET prediction from multimodal retinal imaging. A DDPM can generate realistic and unique multimodal synthetic retinal images. Our best performing unimodal and multimodal classifiers were not pretrained on synthetic data, however pretraining with synthetic data slightly improved classification performance for two out of the four modalities.

Generative artificial intelligence in ophthalmology: multimodal retinal images for the diagnosis of Alzheimer's disease with convolutional neural networks

TL;DR

This work addresses non-invasive Alzheimer's disease screening by predicting AmyloidPET status from multimodal retinal imaging using CNNs. It introduces a diffusion-based generative framework (DDPM) to synthesize four retinal modalities, a filter to ensure realism, and unimodal plus multimodal classifiers with optional metadata fusion. The key findings show that pretraining CNNs on synthetic data improves class-imbalanced precision-recall (AUPR) and that metadata fusion yields the best overall performance on a test set (, ). The work demonstrates potential for synthetic-data-driven improvements in low-sample regimes and highlights interpretable retinal regions via GradCAM, suggesting a path toward cost-effective community AD screening.

Abstract

Background/Aim. This study aims to predict Amyloid Positron Emission Tomography (AmyloidPET) status with multimodal retinal imaging and convolutional neural networks (CNNs) and to improve the performance through pretraining with synthetic data. Methods. Fundus autofluorescence, optical coherence tomography (OCT), and OCT angiography images from 328 eyes of 59 AmyloidPET positive subjects and 108 AmyloidPET negative subjects were used for classification. Denoising Diffusion Probabilistic Models (DDPMs) were trained to generate synthetic images and unimodal CNNs were pretrained on synthetic data and finetuned on real data or trained solely on real data. Multimodal classifiers were developed to combine predictions of the four unimodal CNNs with patient metadata. Class activation maps of the unimodal classifiers provided insight into the network's attention to inputs. Results. DDPMs generated diverse, realistic images without memorization. Pretraining unimodal CNNs with synthetic data improved AUPR at most from 0.350 to 0.579. Integration of metadata in multimodal CNNs improved AUPR from 0.486 to 0.634, which was the best overall best classifier. Class activation maps highlighted relevant retinal regions which correlated with AD. Conclusion. Our method for generating and leveraging synthetic data has the potential to improve AmyloidPET prediction from multimodal retinal imaging. A DDPM can generate realistic and unique multimodal synthetic retinal images. Our best performing unimodal and multimodal classifiers were not pretrained on synthetic data, however pretraining with synthetic data slightly improved classification performance for two out of the four modalities.
Paper Structure (16 sections, 10 figures, 3 tables)

This paper contains 16 sections, 10 figures, 3 tables.

Figures (10)

  • Figure 1: Illustration of the pipeline. (Top): Synthetic images were generated by a DDPM. The synthetic images for which the filter could recognize the modality were included in the training budget of 1000 synthetic images per class. Both synthetic and real images were used to train unimodal classifiers for predicting AmyloidPET status. We created 'baseline' unimodal classifiers trained on real images, and 'pretrained' unimodal classifiers pretrained on $1000$ synthetic images per class and finetuned on real images. Unimodal classifiers were not trained with metadata inputs because synthetic data, which has no associated age and gender, was used for pretraining. (Bottom): We compared multimodal classifiers with the baseline and pretrained unimodal classifiers. The weights of unimodal classifiers were fixed after training. A three-layer fully connected network (FC) performed late heterogeneous fusion of the unimodal predictions and metadata into one AmyloidPET probability prediction. If metadata was included as inputs, age (binary) and gender (scaled by $0.01$) metadata were also fed to the FC. Output of the unimodal and multimodal classifiers were scored between 0 and 1 for the probability of AmyloidPET negative status.
  • Figure 2: Examples of the synthetic images with the highest correlation to any real image. Pairs of synthetic images and the corresponding real image that it most closely resembles are displayed together with scatter plot of the pixel values and the correlation value. (a) OCTA-SMAC; (b) OCT-BONH; (c) OCT-BMAC; (d) FAF. For OCT-BMAC and OCT-BONH the synthetic images strongly resembled the real images but were not exact copies. For OCTA-SMAC and FAF the images with the highest correlations showed less resemblance. This was also reflected by the lower correlation values for OCTA-SMAC. FAF exhibited the highest distribution of maximum correlation values. These images consisted of a predominantly grey background which contributed to a high correlation between any two images of this modality.
  • Figure 3: (A-D): Distributions for maximum pearsonr values computed for $200$ synthetic images and all real images. Distributions display the highest correlation for image pairs among real images (RvR, orange), among synthetic images (SvS, green) and for all synthetic images with any real image (SvR, blue). Arrows indicate the differences between SvR and RvR distributions (black) and the differences between RvR and SvS distributions (grey). WD values express the distance between two distributions with KS test p-values for the significance of such differences. pearsonr = Pearson's correlation coefficient. ** = p < 0.005
  • Figure S1: Examples of generated synthetic images and real images for eight modalities. Top row from left to right: FRG, OCTA-SONH, OCTA-SMAC; OCT-BMAC; OCTA-EMAC. Bottom row left to right: FAF; OCTA-DONH; OCTA-DMAC; OCT-BONH; OCTA-EONH. The synthetic FAF images best resembled their real counterpart, with often accurate branching of the bloodvessels and even replication of the eye lashes at the periphery of the image. Synthetic FRG images failed to replicate the vasculature. Furthermore, the images were not sharp and most of them lacked accurate colors and would be green or yellow similar to the depth-encoded images. Most of the OCT-A images failed to replicate accurate branching of the blood vessels. OCT B-Scan images were overall quite realistic, although sometimes replication of one or two layers in the retina would occur. Depth-encoded OCT-A synthetic images were the least realistic, with malformations in the vasculature as well as in the coloring.
  • Figure S2: Example of excluded fundus. The two most often occurring reasons for exclusion were insufficient focus (a) and coverage of the fundus by the eyelid or eye lashes (b).
  • ...and 5 more figures