Explainable Convolutional Neural Networks for Retinal Fundus Classification and Cutting-Edge Segmentation Models for Retinal Blood Vessels from Fundus Images
Fatema Tuj Johora Faria, Mukaffi Bin Moin, Pronay Debnath, Asif Iftekher Fahim, Faisal Muhammad Shah
TL;DR
This work tackles early detection of retinal disease from fundus images by proposing two end-to-end pipelines: (1) retinal fundus classification using eight pretrained CNNs augmented with multiple explainable AI techniques (Grad-CAM, Grad-CAM++, Score-CAM, Faster Score-CAM, LayerCAM) and (2) retinal blood vessel segmentation leveraging TransUNet, Attention U‑Net, and Swin‑UNET with diverse backbones. The study comprehensively evaluates models on the FIVES and DRIVE datasets, reporting ResNet101 as the top classifier (accuracy about 94.17%) and Attention U‑Net with ResNet101V2 backbones as the best segmenter on both datasets (IoU up to ~0.72 and Dice up to ~0.84 on FIVES). By integrating XAI visualizations with strong architectures, the work enhances interpretability and clinical trust while establishing robust benchmarks for both classification and segmentation in retinal imaging. The findings suggest practical impact in automated screening and vascular analysis, providing a foundation for future improvements in model transferability, multimodal fusion, and real-time clinical deployment.
Abstract
Our research focuses on the critical field of early diagnosis of disease by examining retinal blood vessels in fundus images. While automatic segmentation of retinal blood vessels holds promise for early detection, accurate analysis remains challenging due to the limitations of existing methods, which often lack discrimination power and are susceptible to influences from pathological regions. Our research in fundus image analysis advances deep learning-based classification using eight pre-trained CNN models. To enhance interpretability, we utilize Explainable AI techniques such as Grad-CAM, Grad-CAM++, Score-CAM, Faster Score-CAM, and Layer CAM. These techniques illuminate the decision-making processes of the models, fostering transparency and trust in their predictions. Expanding our exploration, we investigate ten models, including TransUNet with ResNet backbones, Attention U-Net with DenseNet and ResNet backbones, and Swin-UNET. Incorporating diverse architectures such as ResNet50V2, ResNet101V2, ResNet152V2, and DenseNet121 among others, this comprehensive study deepens our insights into attention mechanisms for enhanced fundus image analysis. Among the evaluated models for fundus image classification, ResNet101 emerged with the highest accuracy, achieving an impressive 94.17%. On the other end of the spectrum, EfficientNetB0 exhibited the lowest accuracy among the models, achieving a score of 88.33%. Furthermore, in the domain of fundus image segmentation, Swin-Unet demonstrated a Mean Pixel Accuracy of 86.19%, showcasing its effectiveness in accurately delineating regions of interest within fundus images. Conversely, Attention U-Net with DenseNet201 backbone exhibited the lowest Mean Pixel Accuracy among the evaluated models, achieving a score of 75.87%.
