Table of Contents
Fetching ...

Deep Learning Ensemble for Predicting Diabetic Macular Edema Onset Using Ultra-Wide Field Color Fundus Image

Pengyao Qin, Arun J. Thirunavukarasu, Theodoros Arvanitis, Le Zhang

TL;DR

This work addresses predicting center-involved diabetic macular edema onset within 12 months using ultra-widefield fundus images. It presents a deep-learning ensemble that combines ResNet, DenseNet, EfficientNet, and VGG, trained on synthetic DIAMOND data, and evaluated with AUC, F1, and calibration error. The label-fusion ensemble achieves competitive calibration and discrimination, with single DenseNet-121 showing strong standalone performance, highlighting a balance between peak accuracy and generalizability. The approach demonstrates potential for scalable, proactive ci-DME screening, though it relies on synthetic data and cross-device variability, pointing to future directions including attention-based models to further improve clinical utility.

Abstract

Diabetic macular edema (DME) is a severe complication of diabetes, characterized by thickening of the central portion of the retina due to accumulation of fluid. DME is a significant and common cause of visual impairment in diabetic patients. Center-involved DME (ci-DME) is the highest risk form of disease because fluid extends close to the fovea which is responsible for sharp central vision. Earlier diagnosis or prediction of ci-DME may improve treatment outcomes. Here, we propose an ensemble method to predict ci-DME onset within a year, after using synthetic ultra-wide field color fundus photography (UWF-CFP) images provided by the DIAMOND Challenge during development. We adopted a variety of baseline state-of-the-art classification networks including ResNet, DenseNet, EfficientNet, and VGG with the aim of enhancing model robustness. The best performing models were Densenet-121, Resnet-152 and EfficientNet-b7, and these were assembled into a definitive predictive model. The final ensemble model demonstrates a strong performance with an Area Under Curve (AUC) of 0.7017, an F1 score of 0.6512, and an Expected Calibration Error (ECE) of 0.2057 when deployed on the synthetic test dataset. Results from our ensemble model were superior/comparable to previous recorded results in highly curated settings using conventional fundus photography/ultra-wide field fundus photography. Optimal sensitivity in previous studies (using humans or computers to diagnose) ranges from 67.3%-98%, specificity from 47.8%-80%. Therefore, our method can be used safely and effectively in a range of settings may facilitate earlier diagnosis, better treatment decisions, and improved prognostication in ci-DME.

Deep Learning Ensemble for Predicting Diabetic Macular Edema Onset Using Ultra-Wide Field Color Fundus Image

TL;DR

This work addresses predicting center-involved diabetic macular edema onset within 12 months using ultra-widefield fundus images. It presents a deep-learning ensemble that combines ResNet, DenseNet, EfficientNet, and VGG, trained on synthetic DIAMOND data, and evaluated with AUC, F1, and calibration error. The label-fusion ensemble achieves competitive calibration and discrimination, with single DenseNet-121 showing strong standalone performance, highlighting a balance between peak accuracy and generalizability. The approach demonstrates potential for scalable, proactive ci-DME screening, though it relies on synthetic data and cross-device variability, pointing to future directions including attention-based models to further improve clinical utility.

Abstract

Diabetic macular edema (DME) is a severe complication of diabetes, characterized by thickening of the central portion of the retina due to accumulation of fluid. DME is a significant and common cause of visual impairment in diabetic patients. Center-involved DME (ci-DME) is the highest risk form of disease because fluid extends close to the fovea which is responsible for sharp central vision. Earlier diagnosis or prediction of ci-DME may improve treatment outcomes. Here, we propose an ensemble method to predict ci-DME onset within a year, after using synthetic ultra-wide field color fundus photography (UWF-CFP) images provided by the DIAMOND Challenge during development. We adopted a variety of baseline state-of-the-art classification networks including ResNet, DenseNet, EfficientNet, and VGG with the aim of enhancing model robustness. The best performing models were Densenet-121, Resnet-152 and EfficientNet-b7, and these were assembled into a definitive predictive model. The final ensemble model demonstrates a strong performance with an Area Under Curve (AUC) of 0.7017, an F1 score of 0.6512, and an Expected Calibration Error (ECE) of 0.2057 when deployed on the synthetic test dataset. Results from our ensemble model were superior/comparable to previous recorded results in highly curated settings using conventional fundus photography/ultra-wide field fundus photography. Optimal sensitivity in previous studies (using humans or computers to diagnose) ranges from 67.3%-98%, specificity from 47.8%-80%. Therefore, our method can be used safely and effectively in a range of settings may facilitate earlier diagnosis, better treatment decisions, and improved prognostication in ci-DME.
Paper Structure (10 sections, 1 equation, 3 figures, 2 tables)

This paper contains 10 sections, 1 equation, 3 figures, 2 tables.

Figures (3)

  • Figure 1: Diagram of an example of each ensemble strategy used in this study, with three individual models predicting ci-DME.
  • Figure 2: Examples of image processing employed during model development. Images were resized to 224 $\times$ 224 px, greyscaled, normalized, randomly reflected in the horizontal and vertical axis, and randomly rotated through 45°. The effects of each step are shown in each panel.
  • Figure 3: Examples of Ultra Wide Field Colour Fundus Photographs captured by different devices, Zeiss CLARUS (left), and OPTOS (right). The Challenge uses datasets from these two devices for evaluation. Significant differences in dimensions, resolution, and artifact can confound classification and thereby place higher demands on the generalisability of a model that is planned to be trained and/or tested across both modalities.