Table of Contents
Fetching ...

ETSCL: An Evidence Theory-Based Supervised Contrastive Learning Framework for Multi-modal Glaucoma Grading

Zhiyuan Yang, Bo Zhang, Yufei Shi, Ningze Zhong, Johnathan Loh, Huihui Fang, Yanwu Xu, Si Yong Yeo

TL;DR

This work addresses multi-modality glaucoma grading using CFP and OCT by integrating contrastive feature learning with a Frangi vesselness auxiliary branch and an uncertainty-aware decision-level fusion based on evidence theory. The two-stage ETSCL framework first learns discriminative modality-specific embeddings via supervised contrastive loss and vessel preprocessing, then fuses the outputs through Dirichlet-based priors and Dempster's rule to produce uncertainty-aware predictions. Key contributions include (i) a triple-branch feature extractor including a Vessel branch, (ii) a Dirichlet/Evidence Theory–driven fusion mechanism with uncertainty estimation, and (iii) state-of-the-art results on the GAMMA dataset with kappa = 0.8844 and accuracy = 0.84. The approach demonstrates strong performance gains over baselines and provides a principled way to quantify modality-level uncertainty, though it is limited by dataset size and the absence of private data for broader generalization testing.

Abstract

Glaucoma is one of the leading causes of vision impairment. Digital imaging techniques, such as color fundus photography (CFP) and optical coherence tomography (OCT), provide quantitative and noninvasive methods for glaucoma diagnosis. Recently, in the field of computer-aided glaucoma diagnosis, multi-modality methods that integrate the CFP and OCT modalities have achieved greater diagnostic accuracy compared to single-modality methods. However, it remains challenging to extract reliable features due to the high similarity of medical images and the unbalanced multi-modal data distribution. Moreover, existing methods overlook the uncertainty estimation of different modalities, leading to unreliable predictions. To address these challenges, we propose a novel framework, namely ETSCL, which consists of a contrastive feature extraction stage and a decision-level fusion stage. Specifically, the supervised contrastive loss is employed to enhance the discriminative power in the feature extraction process, resulting in more effective features. In addition, we utilize the Frangi vesselness algorithm as a preprocessing step to incorporate vessel information to assist in the prediction. In the decision-level fusion stage, an evidence theory-based multi-modality classifier is employed to combine multi-source information with uncertainty estimation. Extensive experiments demonstrate that our method achieves state-of-the-art performance. The code is available at \url{https://github.com/master-Shix/ETSCL}.

ETSCL: An Evidence Theory-Based Supervised Contrastive Learning Framework for Multi-modal Glaucoma Grading

TL;DR

This work addresses multi-modality glaucoma grading using CFP and OCT by integrating contrastive feature learning with a Frangi vesselness auxiliary branch and an uncertainty-aware decision-level fusion based on evidence theory. The two-stage ETSCL framework first learns discriminative modality-specific embeddings via supervised contrastive loss and vessel preprocessing, then fuses the outputs through Dirichlet-based priors and Dempster's rule to produce uncertainty-aware predictions. Key contributions include (i) a triple-branch feature extractor including a Vessel branch, (ii) a Dirichlet/Evidence Theory–driven fusion mechanism with uncertainty estimation, and (iii) state-of-the-art results on the GAMMA dataset with kappa = 0.8844 and accuracy = 0.84. The approach demonstrates strong performance gains over baselines and provides a principled way to quantify modality-level uncertainty, though it is limited by dataset size and the absence of private data for broader generalization testing.

Abstract

Glaucoma is one of the leading causes of vision impairment. Digital imaging techniques, such as color fundus photography (CFP) and optical coherence tomography (OCT), provide quantitative and noninvasive methods for glaucoma diagnosis. Recently, in the field of computer-aided glaucoma diagnosis, multi-modality methods that integrate the CFP and OCT modalities have achieved greater diagnostic accuracy compared to single-modality methods. However, it remains challenging to extract reliable features due to the high similarity of medical images and the unbalanced multi-modal data distribution. Moreover, existing methods overlook the uncertainty estimation of different modalities, leading to unreliable predictions. To address these challenges, we propose a novel framework, namely ETSCL, which consists of a contrastive feature extraction stage and a decision-level fusion stage. Specifically, the supervised contrastive loss is employed to enhance the discriminative power in the feature extraction process, resulting in more effective features. In addition, we utilize the Frangi vesselness algorithm as a preprocessing step to incorporate vessel information to assist in the prediction. In the decision-level fusion stage, an evidence theory-based multi-modality classifier is employed to combine multi-source information with uncertainty estimation. Extensive experiments demonstrate that our method achieves state-of-the-art performance. The code is available at \url{https://github.com/master-Shix/ETSCL}.
Paper Structure (14 sections, 8 equations, 2 figures, 3 tables)

This paper contains 14 sections, 8 equations, 2 figures, 3 tables.

Figures (2)

  • Figure 1: Multi-modal imaging from the GAMMA dataset demonstrates glaucoma progression across three stages. Column (a) features images without glaucoma. Column (b) features early-stage glaucoma. Column (c) features intermediate-advanced-stage glaucoma. The top row displays CFP images, showing increased vCDR. The middle row displays OCT samples, showing thinning of the RNFL and GCIPL. The bottom row displays vessel structures, extracted from CFP images.
  • Figure 2: Overview of the proposed ETSCL framework.