Evaluating Fundus-Specific Foundation Models for Diabetic Macular Edema Detection
Franco Javier Arellano, José Ignacio Orlando
TL;DR
The paper tackles the problem of detecting diabetic macular edema from fundus images in data-scarce settings. It systematically compares fundus-specific foundation models RETFound and FLAIR with a lightweight CNN backbone (EfficientNet-B0) under standard fine-tuning, linear probing, and zero-shot prediction across three public datasets, using $AUC$-PR and $AUC$-ROC metrics. The findings show that foundation models do not consistently outperform fine-tuned CNNs, with EfficientNet-B0 often achieving top performance, while zero-shot FLAIR presents competitive results dependent on prompts and datasets. The work highlights the continued effectiveness of lightweight CNNs for fine-grained ophthalmic tasks and emphasizes the need for careful evaluation of foundation models, especially regarding cross-dataset generalization and prompt design for zero-shot use.
Abstract
Diabetic Macular Edema (DME) is a leading cause of vision loss among patients with Diabetic Retinopathy (DR). While deep learning has shown promising results for automatically detecting this condition from fundus images, its application remains challenging due the limited availability of annotated data. Foundation Models (FM) have emerged as an alternative solution. However, it is unclear if they can cope with DME detection in particular. In this paper, we systematically compare different FM and standard transfer learning approaches for this task. Specifically, we compare the two most popular FM for retinal images--RETFound and FLAIR--and an EfficientNet-B0 backbone, across different training regimes and evaluation settings in IDRiD, MESSIDOR-2 and OCT-and-Eye-Fundus-Images (OEFI). Results show that despite their scale, FM do not consistently outperform fine-tuned CNNs in this task. In particular, an EfficientNet-B0 ranked first or second in terms of area under the ROC and precision/recall curves in most evaluation settings, with RETFound only showing promising results in OEFI. FLAIR, on the other hand, demonstrated competitive zero-shot performance, achieving notable AUC-PR scores when prompted appropriately. These findings reveal that FM might not be a good tool for fine-grained ophthalmic tasks such as DME detection even after fine-tuning, suggesting that lightweight CNNs remain strong baselines in data-scarce environments.
