Learning to Fuse and Reconstruct Multi-View Graphs for Diabetic Retinopathy Grading

Haoran Li; Yuxin Lin; Huan Wang; Xiaoling Luo; Qi Zhu; Jiahua Shi; Huaming Chen; Bo Du; Johan Barthelemy; Zongyan Xue; Jun Shen; Yong Xu

Learning to Fuse and Reconstruct Multi-View Graphs for Diabetic Retinopathy Grading

Haoran Li, Yuxin Lin, Huan Wang, Xiaoling Luo, Qi Zhu, Jiahua Shi, Huaming Chen, Bo Du, Johan Barthelemy, Zongyan Xue, Jun Shen, Yong Xu

TL;DR

Extensive experimental results on MFIDDR, by far the largest multi-view fundus image dataset, demonstrate the superiority of the proposed approach over existing state-of-the-art approaches in diabetic retinopathy grading.

Abstract

Diabetic retinopathy (DR) is one of the leading causes of vision loss worldwide, making early and accurate DR grading critical for timely intervention. Recent clinical practices leverage multi-view fundus images for DR detection with a wide coverage of the field of view (FOV), motivating deep learning methods to explore the potential of multi-view learning for DR grading. However, existing methods often overlook the inter-view correlations when fusing multi-view fundus images, failing to fully exploit the inherent consistency across views originating from the same patient. In this work, we present MVGFDR, an end-to-end Multi-View Graph Fusion framework for DR grading. Different from existing methods that directly fuse visual features from multiple views, MVGFDR is equipped with a novel Multi-View Graph Fusion (MVGF) module to explicitly disentangle the shared and view-specific visual features. Specifically, MVGF comprises three key components: (1) Multi-view Graph Initialization, which constructs visual graphs via residual-guided connections and employs Discrete Cosine Transform (DCT) coefficients as frequency-domain anchors; (2) Multi-view Graph Fusion, which integrates selective nodes across multi-view graphs based on frequency-domain relevance to capture complementary view-specific information; and (3) Masked Cross-view Reconstruction, which leverages masked reconstruction of shared information across views to facilitate view-invariant representation learning. Extensive experimental results on MFIDDR, by far the largest multi-view fundus image dataset, demonstrate the superiority of our proposed approach over existing state-of-the-art approaches in diabetic retinopathy grading.

Learning to Fuse and Reconstruct Multi-View Graphs for Diabetic Retinopathy Grading

TL;DR

Abstract

Paper Structure (17 sections, 20 equations, 6 figures, 7 tables)

This paper contains 17 sections, 20 equations, 6 figures, 7 tables.

Introduction
Related Work
Deep Learning in DR Grading
Visual Graph Learning
Method
Overview
Multi-View Graph Fusion
Masked Cross-View Reconstruction
GCN-based Reconstructor (GCR)
Cross-View Reconstructor (CVR)
Experiments
Experimental Setup
Comparison with State-of-the-art Methods
Additional Experiments on the Generated Multi-view DR Dataset
Ablation study
...and 2 more sections

Figures (6)

Figure 1: Illustration of our motivation. (a) Semantic information captured by different DCT frequency components in fundus images. (b) Comparison between (i) existing multi-view methods and (ii) our approach. Unlike prior methods, our approach disentangles the feature embeddings into shared and unique components, applying feature fusion and masked reconstruction learning separately to each. (c) Comparison of state-of-the-art multi-view DR methods and our proposed MVGFDR on seven evaluation metrics.
Figure 2: Overall framework of MVGFDR. (a) Multi-view Graph Initialization (MVGI) for each view based on DCT frequency. (b) Graph nodes selection from each view according to their corresponding high-, mid- and low-frequency DCT components. (c) Multi-view Graph Fusion (MGF) of high-frequency information using a GCN. (d) Masked Cross-View Reconstruction (MCVR) on low- and mid-frequency information with the random node masking rate $\eta$.
Figure 3: Visualization of view-specific representations extracted by MVGI from different views. A representative Grade-4 sample is shown for illustration. The visualization focuses on the high-frequency (DCT) representations, which are associated with lesion-related patterns.
Figure 4: Proposed Masked Cross-View Reconstructor (MCVR) (a) We set a simple cross-graph reconstructor as the comparison baseline. (b) The proposed MCVR adopts a decoder-only transformer to perform masked view reconstruction guided by view-positional embeddings (VP, ) and frequency-positional embeddings (FP, ).
Figure 5: Examples of V1-V4 views of different datasets. (a) MFIDDR. (b) The generated MVG-DDR.
...and 1 more figures

Learning to Fuse and Reconstruct Multi-View Graphs for Diabetic Retinopathy Grading

TL;DR

Abstract

Learning to Fuse and Reconstruct Multi-View Graphs for Diabetic Retinopathy Grading

Authors

TL;DR

Abstract

Table of Contents

Figures (6)