Table of Contents
Fetching ...

Cross-Fundus Transformer for Multi-modal Diabetic Retinopathy Grading with Cataract

Fan Xiao, Junlin Hou, Ruiwei Zhao, Rui Feng, Haidong Zou, Lina Lu, Yi Xu, Juzhao Zhang

TL;DR

This is the first study that explores a novel multi-modal deep learning framework to fuse the information from CFP and IFP towards more accurate DR grading and constructs a dual-stream architecture Cross-Fundus Transformer (CFT) to fuse the ViT-based features of two fundus image modalities.

Abstract

Diabetic retinopathy (DR) is a leading cause of blindness worldwide and a common complication of diabetes. As two different imaging tools for DR grading, color fundus photography (CFP) and infrared fundus photography (IFP) are highly-correlated and complementary in clinical applications. To the best of our knowledge, this is the first study that explores a novel multi-modal deep learning framework to fuse the information from CFP and IFP towards more accurate DR grading. Specifically, we construct a dual-stream architecture Cross-Fundus Transformer (CFT) to fuse the ViT-based features of two fundus image modalities. In particular, a meticulously engineered Cross-Fundus Attention (CFA) module is introduced to capture the correspondence between CFP and IFP images. Moreover, we adopt both the single-modality and multi-modality supervisions to maximize the overall performance for DR grading. Extensive experiments on a clinical dataset consisting of 1,713 pairs of multi-modal fundus images demonstrate the superiority of our proposed method. Our code will be released for public access.

Cross-Fundus Transformer for Multi-modal Diabetic Retinopathy Grading with Cataract

TL;DR

This is the first study that explores a novel multi-modal deep learning framework to fuse the information from CFP and IFP towards more accurate DR grading and constructs a dual-stream architecture Cross-Fundus Transformer (CFT) to fuse the ViT-based features of two fundus image modalities.

Abstract

Diabetic retinopathy (DR) is a leading cause of blindness worldwide and a common complication of diabetes. As two different imaging tools for DR grading, color fundus photography (CFP) and infrared fundus photography (IFP) are highly-correlated and complementary in clinical applications. To the best of our knowledge, this is the first study that explores a novel multi-modal deep learning framework to fuse the information from CFP and IFP towards more accurate DR grading. Specifically, we construct a dual-stream architecture Cross-Fundus Transformer (CFT) to fuse the ViT-based features of two fundus image modalities. In particular, a meticulously engineered Cross-Fundus Attention (CFA) module is introduced to capture the correspondence between CFP and IFP images. Moreover, we adopt both the single-modality and multi-modality supervisions to maximize the overall performance for DR grading. Extensive experiments on a clinical dataset consisting of 1,713 pairs of multi-modal fundus images demonstrate the superiority of our proposed method. Our code will be released for public access.

Paper Structure

This paper contains 12 sections, 6 equations, 4 figures, 2 tables.

Figures (4)

  • Figure 1: Examples of CFP and IFP image. (a) Both clear images; (b) The same lesion is unclear in CFP while clear in IFP; (c) Comparison of different DR lesions in CFP and IFP.
  • Figure 2: The architecture of our Cross-Fundus Transformer (CFT) for DR grading. It consists Transformer encoders, linear projection layers, Cross-Fundus Attention (CFA) module, classifier layer and MLP heads.
  • Figure 3: Results of different loss function weights $\lambda$. The numbers in square brackets indicate the 95% confidence interval
  • Figure 4: Visualization of CFP and IFP for DR grading via Attention Rollout abnar2020quantifying.