Table of Contents
Fetching ...

Detecting Severity of Diabetic Retinopathy from Fundus Images: A Transformer Network-based Review

Tejas Karkera, Chandranath Adak, Soumi Chattopadhyay, Muhammad Saqib

TL;DR

This work tackles automatic DR severity grading from fundus images using an ensemble of four image transformers (ViT, DeiT, CaiT, BEiT). By applying targeted preprocessing, finetuning, and two fusion strategies (weighted mean and majority voting), the authors show that the weighted-mean ensemble EiT_wm achieves the highest accuracy of 94.63% with strong agreement to human raters (kappa ≈ 0.92) on the APTOS-2019 dataset, outperforming several CNN baselines and prior transformer approaches. Ablation studies and cross-dataset pretraining further demonstrate the robustness and potential of transformer ensembles for ophthalmic image analysis. The findings suggest transformer-based ensembles can effectively capture salient retinal features for DR severity while offering interpretable localization via Grad-CAM, with future work focusing on addressing class imbalance and incorporating lesion segmentation to further enhance performance and explainability.

Abstract

Diabetic Retinopathy (DR) is considered one of the significant concerns worldwide, primarily due to its impact on causing vision loss among most people with diabetes. The severity of DR is typically comprehended manually by ophthalmologists from fundus photography-based retina images. This paper deals with an automated understanding of the severity stages of DR. In the literature, researchers have focused on this automation using traditional machine learning-based algorithms and convolutional architectures. However, the past works hardly focused on essential parts of the retinal image to improve the model performance. In this study, we adopt and fine-tune transformer-based learning models to capture the crucial features of retinal images for a more nuanced understanding of DR severity. Additionally, we explore the effectiveness of image transformers to infer the degree of DR severity from fundus photographs. For experiments, we utilized the publicly available APTOS-2019 blindness detection dataset, where the performances of the transformer-based models were quite encouraging.

Detecting Severity of Diabetic Retinopathy from Fundus Images: A Transformer Network-based Review

TL;DR

This work tackles automatic DR severity grading from fundus images using an ensemble of four image transformers (ViT, DeiT, CaiT, BEiT). By applying targeted preprocessing, finetuning, and two fusion strategies (weighted mean and majority voting), the authors show that the weighted-mean ensemble EiT_wm achieves the highest accuracy of 94.63% with strong agreement to human raters (kappa ≈ 0.92) on the APTOS-2019 dataset, outperforming several CNN baselines and prior transformer approaches. Ablation studies and cross-dataset pretraining further demonstrate the robustness and potential of transformer ensembles for ophthalmic image analysis. The findings suggest transformer-based ensembles can effectively capture salient retinal features for DR severity while offering interpretable localization via Grad-CAM, with future work focusing on addressing class imbalance and incorporating lesion segmentation to further enhance performance and explainability.

Abstract

Diabetic Retinopathy (DR) is considered one of the significant concerns worldwide, primarily due to its impact on causing vision loss among most people with diabetes. The severity of DR is typically comprehended manually by ophthalmologists from fundus photography-based retina images. This paper deals with an automated understanding of the severity stages of DR. In the literature, researchers have focused on this automation using traditional machine learning-based algorithms and convolutional architectures. However, the past works hardly focused on essential parts of the retinal image to improve the model performance. In this study, we adopt and fine-tune transformer-based learning models to capture the crucial features of retinal images for a more nuanced understanding of DR severity. Additionally, we explore the effectiveness of image transformers to infer the degree of DR severity from fundus photographs. For experiments, we utilized the publicly available APTOS-2019 blindness detection dataset, where the performances of the transformer-based models were quite encouraging.
Paper Structure (22 sections, 12 equations, 10 figures, 7 tables)

This paper contains 22 sections, 12 equations, 10 figures, 7 tables.

Figures (10)

  • Figure 1: Fundus images with DR severity stages from APTOS-2019 aptos2019
  • Figure 2: Workflow of ViT
  • Figure 3: Internal view of a transformer encoder (TE)
  • Figure 4: The distillation procedure of DeiT
  • Figure 5: Workflow of CaiT
  • ...and 5 more figures