Diabetic Retinopathy Grading with CLIP-based Ranking-Aware Adaptation:A Comparative Study on Fundus Image

Sungjun Cho

Diabetic Retinopathy Grading with CLIP-based Ranking-Aware Adaptation:A Comparative Study on Fundus Image

Sungjun Cho

Abstract

Diabetic retinopathy (DR) is a leading cause of preventable blindness, and automated fundus image grading can play an important role in large-scale screening. In this work, we investigate three CLIP-based approaches for five-class DR severity grading: (1) a zero-shot baseline using prompt engineering, (2) a hybrid FCN-CLIP model augmented with CBAM attention, and (3) a ranking-aware prompting model that encodes the ordinal structure of DR progression. We train and evaluate on a combined dataset of APTOS 2019 and Messidor-2 (n=5,406), addressing class imbalance through resampling and class-specific optimal thresholding. Our experiments show that the ranking-aware model achieves the highest overall accuracy (93.42%, AUROC 0.9845) and strong recall on clinically critical severe cases, while the hybrid FCN-CLIP model (92.49%, AUROC 0.99) excels at detecting proliferative DR. Both substantially outperform the zero-shot baseline (55.17%, AUROC 0.75). We analyze the complementary strengths of each approach and discuss their practical implications for screening contexts.

Diabetic Retinopathy Grading with CLIP-based Ranking-Aware Adaptation:A Comparative Study on Fundus Image

Abstract

Paper Structure (31 sections, 8 figures, 5 tables)

This paper contains 31 sections, 8 figures, 5 tables.

Introduction
Related Work
Deep learning for DR grading.
Vision-language models in medical imaging.
Ordinal classification and ranking-aware learning.
Attention mechanisms for lesion localization.
Methods
Zero-shot CLIP Baseline
Hybrid FCN-CLIP with CBAM Attention
Ranking-aware Prompting Model
Dataset and Preprocessing
Experimental Setup
Results
Overall Performance
Error Distribution
...and 16 more sections

Figures (8)

Figure 1: Attention map comparison for a mild DR case. Left: zero-shot CLIP (diffuse, anatomy-focused). Right: Hybrid FCN-CLIP with CBAM (concentrated on lesion regions).
Figure 2: Training and validation curves: Ranking-aware prompting model.
Figure 3: Training and validation curves: Hybrid FCN-CLIP model.
Figure 4: Confusion matrix: Ranking-aware prompting model.
Figure 5: One-vs-rest ROC curves: Ranking-aware model.
...and 3 more figures

Diabetic Retinopathy Grading with CLIP-based Ranking-Aware Adaptation:A Comparative Study on Fundus Image

Abstract

Diabetic Retinopathy Grading with CLIP-based Ranking-Aware Adaptation:A Comparative Study on Fundus Image

Authors

Abstract

Table of Contents

Figures (8)