Table of Contents
Fetching ...

Enhancing Fundus Image-based Glaucoma Screening via Dynamic Global-Local Feature Integration

Yuzhuo Zhou, Chi Liu, Sheng Shen, Siyu Le, Liwen Yu, Sihan Ouyang, Zongyuan Ge

TL;DR

This work tackles glaucoma screening from fundus images under real-world variability by introducing a cross-attention three-branch architecture that fuses global, ROI-based local, and dynamic-window local features. A ResNet152-CBAM backbone powers feature extraction, with a pretrained ROI segmentation model guiding local analysis and a dynamic window mechanism selecting informative patches to mitigate boundary uncertainty. The approach achieves superior accuracy and robustness on the Rotterdam EyePACS AIROGS dataset, outperforming baseline architectures across AP, AUC, accuracy, and F1 while maintaining strong specificity. The method holds practical significance for reliable glaucoma detection across diverse imaging devices and populations, addressing both image quality variation and clinical boundary uncertainty.

Abstract

With the advancements in medical artificial intelligence (AI), fundus image classifiers are increasingly being applied to assist in ophthalmic diagnosis. While existing classification models have achieved high accuracy on specific fundus datasets, they struggle to address real-world challenges such as variations in image quality across different imaging devices, discrepancies between training and testing images across different racial groups, and the uncertain boundaries due to the characteristics of glaucomatous cases. In this study, we aim to address the above challenges posed by image variations by highlighting the importance of incorporating comprehensive fundus image information, including the optic cup (OC) and optic disc (OD) regions, and other key image patches. Specifically, we propose a self-adaptive attention window that autonomously determines optimal boundaries for enhanced feature extraction. Additionally, we introduce a multi-head attention mechanism to effectively fuse global and local features via feature linear readout, improving the model's discriminative capability. Experimental results demonstrate that our method achieves superior accuracy and robustness in glaucoma classification.

Enhancing Fundus Image-based Glaucoma Screening via Dynamic Global-Local Feature Integration

TL;DR

This work tackles glaucoma screening from fundus images under real-world variability by introducing a cross-attention three-branch architecture that fuses global, ROI-based local, and dynamic-window local features. A ResNet152-CBAM backbone powers feature extraction, with a pretrained ROI segmentation model guiding local analysis and a dynamic window mechanism selecting informative patches to mitigate boundary uncertainty. The approach achieves superior accuracy and robustness on the Rotterdam EyePACS AIROGS dataset, outperforming baseline architectures across AP, AUC, accuracy, and F1 while maintaining strong specificity. The method holds practical significance for reliable glaucoma detection across diverse imaging devices and populations, addressing both image quality variation and clinical boundary uncertainty.

Abstract

With the advancements in medical artificial intelligence (AI), fundus image classifiers are increasingly being applied to assist in ophthalmic diagnosis. While existing classification models have achieved high accuracy on specific fundus datasets, they struggle to address real-world challenges such as variations in image quality across different imaging devices, discrepancies between training and testing images across different racial groups, and the uncertain boundaries due to the characteristics of glaucomatous cases. In this study, we aim to address the above challenges posed by image variations by highlighting the importance of incorporating comprehensive fundus image information, including the optic cup (OC) and optic disc (OD) regions, and other key image patches. Specifically, we propose a self-adaptive attention window that autonomously determines optimal boundaries for enhanced feature extraction. Additionally, we introduce a multi-head attention mechanism to effectively fuse global and local features via feature linear readout, improving the model's discriminative capability. Experimental results demonstrate that our method achieves superior accuracy and robustness in glaucoma classification.

Paper Structure

This paper contains 15 sections, 4 equations, 5 figures, 2 tables.

Figures (5)

  • Figure 1: The motivation of our method: Detecting glaucoma with aligning with clinical standards.
  • Figure 2: The architecture of our model. Global branch extracts global features and Local branch extracts subtle feature from local patches selected by DWM. The feature fusion method fuses the global feature embeddings and local feature embeddings for the classification.
  • Figure 3: The samples both contain full images,segmented ROI, ROI 800 (in a higher view) and ROI 800 Clahe (Contrast Limited Adaptive Histogram Equalization) which is our final input. Red boxes mark the target region ROI,Yellow boxes are used to mark the effect of higher view and Blue boxes mark the improving image contrast.
  • Figure 4: The predicted results of our proposed model.
  • Figure 5: Figure 5 illustrates the misclassifications in both 'referable' and 'non-referable' categories. Yellow boxes highlight the regions where various factors may misdirect the model to make errors potentially.