Guided Context Gating: Learning to leverage salient lesions in retinal fundus images
Teja Krishna Cherukuri, Nagur Shareef Shaik, Dong Hye Ye
TL;DR
This work addresses the challenge of representing retinal fundus images for diabetic retinopathy by introducing Guided Context Gating, a modular attention mechanism that jointly learns global context, spatial correlations, and lesion-specific local context. The method combines a Convolutional Base (EfficientNetV2B0), Context Formulation, Channel Correlation, and Guided Gating, followed by a Regularized Classification Head to robustly classify DR severity even with imbalanced data. Empirical results on Zenodo-DR-7 (and additional datasets) show higher accuracy and AUC than competing attention mechanisms and Vision Transformers, along with improved explainability via lesion-focused attention maps and discrimination of intra-similar lesions. The approach demonstrates strong potential for clinical deployment and can extend to other medical imaging tasks requiring precise localization of salient pathology.
Abstract
Effectively representing medical images, especially retinal images, presents a considerable challenge due to variations in appearance, size, and contextual information of pathological signs called lesions. Precise discrimination of these lesions is crucial for diagnosing vision-threatening issues such as diabetic retinopathy. While visual attention-based neural networks have been introduced to learn spatial context and channel correlations from retinal images, they often fall short in capturing localized lesion context. Addressing this limitation, we propose a novel attention mechanism called Guided Context Gating, an unique approach that integrates Context Formulation, Channel Correlation, and Guided Gating to learn global context, spatial correlations, and localized lesion context. Our qualitative evaluation against existing attention mechanisms emphasize the superiority of Guided Context Gating in terms of explainability. Notably, experiments on the Zenodo-DR-7 dataset reveal a substantial 2.63% accuracy boost over advanced attention mechanisms & an impressive 6.53% improvement over the state-of-the-art Vision Transformer for assessing the severity grade of retinopathy, even with imbalanced and limited training samples for each class.
