Enhancing Fundus Image-based Glaucoma Screening via Dynamic Global-Local Feature Integration
Yuzhuo Zhou, Chi Liu, Sheng Shen, Siyu Le, Liwen Yu, Sihan Ouyang, Zongyuan Ge
TL;DR
This work tackles glaucoma screening from fundus images under real-world variability by introducing a cross-attention three-branch architecture that fuses global, ROI-based local, and dynamic-window local features. A ResNet152-CBAM backbone powers feature extraction, with a pretrained ROI segmentation model guiding local analysis and a dynamic window mechanism selecting informative patches to mitigate boundary uncertainty. The approach achieves superior accuracy and robustness on the Rotterdam EyePACS AIROGS dataset, outperforming baseline architectures across AP, AUC, accuracy, and F1 while maintaining strong specificity. The method holds practical significance for reliable glaucoma detection across diverse imaging devices and populations, addressing both image quality variation and clinical boundary uncertainty.
Abstract
With the advancements in medical artificial intelligence (AI), fundus image classifiers are increasingly being applied to assist in ophthalmic diagnosis. While existing classification models have achieved high accuracy on specific fundus datasets, they struggle to address real-world challenges such as variations in image quality across different imaging devices, discrepancies between training and testing images across different racial groups, and the uncertain boundaries due to the characteristics of glaucomatous cases. In this study, we aim to address the above challenges posed by image variations by highlighting the importance of incorporating comprehensive fundus image information, including the optic cup (OC) and optic disc (OD) regions, and other key image patches. Specifically, we propose a self-adaptive attention window that autonomously determines optimal boundaries for enhanced feature extraction. Additionally, we introduce a multi-head attention mechanism to effectively fuse global and local features via feature linear readout, improving the model's discriminative capability. Experimental results demonstrate that our method achieves superior accuracy and robustness in glaucoma classification.
