Table of Contents
Fetching ...

Wavelet-based Global-Local Interaction Network with Cross-Attention for Multi-View Diabetic Retinopathy Detection

Yongting Hu, Yuxin Lin, Chengliang Liu, Xiaoling Luo, Xiaoyan Dou, Qihao Xu, Yong Xu

TL;DR

This work tackles multi-view diabetic retinopathy detection by learning both local lesion details and global context across four fundus views. It introduces a two-branch network (CNN for local features and a Transformer for global dependencies) guided by wavelet high-frequency components to enhance lesion edges, and a Cross-View Fusion Module that employs cross-attention and a learnable query to reduce inter-view redundancy. The Wavelet Based Global-Local Interaction Module and CVFM together achieve superior multi-view fusion, with final predictions formed by fusing branch logits. Experiments on a large MFIDDR dataset show competitive and often superior performance across multiple metrics, and the approach is open-sourced for reproducibility, highlighting its potential impact on automated DR screening.

Abstract

Multi-view diabetic retinopathy (DR) detection has recently emerged as a promising method to address the issue of incomplete lesions faced by single-view DR. However, it is still challenging due to the variable sizes and scattered locations of lesions. Furthermore, existing multi-view DR methods typically merge multiple views without considering the correlations and redundancies of lesion information across them. Therefore, we propose a novel method to overcome the challenges of difficult lesion information learning and inadequate multi-view fusion. Specifically, we introduce a two-branch network to obtain both local lesion features and their global dependencies. The high-frequency component of the wavelet transform is used to exploit lesion edge information, which is then enhanced by global semantic to facilitate difficult lesion learning. Additionally, we present a cross-view fusion module to improve multi-view fusion and reduce redundancy. Experimental results on large public datasets demonstrate the effectiveness of our method. The code is open sourced on https://github.com/HuYongting/WGLIN.

Wavelet-based Global-Local Interaction Network with Cross-Attention for Multi-View Diabetic Retinopathy Detection

TL;DR

This work tackles multi-view diabetic retinopathy detection by learning both local lesion details and global context across four fundus views. It introduces a two-branch network (CNN for local features and a Transformer for global dependencies) guided by wavelet high-frequency components to enhance lesion edges, and a Cross-View Fusion Module that employs cross-attention and a learnable query to reduce inter-view redundancy. The Wavelet Based Global-Local Interaction Module and CVFM together achieve superior multi-view fusion, with final predictions formed by fusing branch logits. Experiments on a large MFIDDR dataset show competitive and often superior performance across multiple metrics, and the approach is open-sourced for reproducibility, highlighting its potential impact on automated DR screening.

Abstract

Multi-view diabetic retinopathy (DR) detection has recently emerged as a promising method to address the issue of incomplete lesions faced by single-view DR. However, it is still challenging due to the variable sizes and scattered locations of lesions. Furthermore, existing multi-view DR methods typically merge multiple views without considering the correlations and redundancies of lesion information across them. Therefore, we propose a novel method to overcome the challenges of difficult lesion information learning and inadequate multi-view fusion. Specifically, we introduce a two-branch network to obtain both local lesion features and their global dependencies. The high-frequency component of the wavelet transform is used to exploit lesion edge information, which is then enhanced by global semantic to facilitate difficult lesion learning. Additionally, we present a cross-view fusion module to improve multi-view fusion and reduce redundancy. Experimental results on large public datasets demonstrate the effectiveness of our method. The code is open sourced on https://github.com/HuYongting/WGLIN.

Paper Structure

This paper contains 16 sections, 14 equations, 5 figures, 4 tables.

Figures (5)

  • Figure 1: Samples of color fundus images. (a) Fundus image and zoomed-in region. (b) Multi-view fundus images with lesion distribution. The yellow box encloses the lesions of the same region.
  • Figure 2: Overview of the proposed method.
  • Figure 3: Structure of the wavelet based global-local interaction module (WGLIM). Take the input of one view as an example.
  • Figure 4: Structure of th proposed cross-view Fusion module (CVFM).
  • Figure 5: Comparative studies.