GLEAM: A Multimodal Imaging Dataset and HAMM for Glaucoma Classification

Jiao Wang; Chi Liu; Yiying Zhang; Hongchen Luo; Zhifen Guo; Ying Hu; Ke Xu; Jing Zhou; Hongyan Xu; Ruiting Zhou; Man Tang

GLEAM: A Multimodal Imaging Dataset and HAMM for Glaucoma Classification

Jiao Wang, Chi Liu, Yiying Zhang, Hongchen Luo, Zhifen Guo, Ying Hu, Ke Xu, Jing Zhou, Hongyan Xu, Ruiting Zhou, Man Tang

Abstract

We propose glaucoma lesion evaluation and analysis with multimodal imaging (GLEAM), the first publicly available tri-modal glaucoma dataset comprising scanning laser ophthalmoscopy fundus images, circumpapillary OCT images, and visual field pattern deviation maps, annotated with four disease stages, enabling effective exploitation of multimodal complementary information and facilitating accurate diagnosis and treatment across disease stages. To effectively integrate cross-modal information, we propose hierarchical attentive masked modeling (HAMM) for multimodal glaucoma classification. Our framework employs hierarchical attentive encoders and light decoders to focus cross-modal representation learning on the encoder.

GLEAM: A Multimodal Imaging Dataset and HAMM for Glaucoma Classification

Abstract

Paper Structure (29 sections, 15 equations, 8 figures, 8 tables)

This paper contains 29 sections, 15 equations, 8 figures, 8 tables.

Introduction
Related Work
Glaucoma Datasets
Glaucoma Classification Methods
Glaucoma Dataset
Data Collection and Quality Control
Image Processing for Dataset
Dataset Character
Proposed Method
Multimodal Feature Extraction with MCGA
Masked AutoEncoder Pretraining for Multimodal Representation Learning
Fine-tuning for Multimodal Glaucoma Classification
Experiments and Results
Implementation details
Experiment Environment
...and 14 more sections

Figures (8)

Figure 1: The detail of ophthalmic data, including (a) SLO, OCT, and VF examinations (b) EMR documented by ophthalmologists.
Figure 2: GLEAM dataset character, including (a) distributions of GLEAM, and (b) examples of GLEAM. We add lines for splitting the optic disc/optic cup and the thickness of the RNFL to more visually show the gaps in the images of different progressions of glaucoma.
Figure 3: The architecture of HAMM. Stage one is the masked autoencoder pretraining task to train a better feature representation, and stage two is the classification task to train the final classification model. The MCGA module is designed to model the multimodal information between different images when pretraining and classification. For clarity, token notations (e.g $c^f$, $\hat{v}^f$, $v^f$) are shown for a single modality, though the same applies to all.
Figure 4: Visualization of regions of interest identified by the model using Guided Grad-CAM.
Figure 5: Comprehensive evaluation of model feature representations and classification performance under three input settings: (a), (b), and (c) are t-SNE visualizations of feature embeddings corresponding to the input settings of VF, SLO+VF, and SLO+OCT+VF respectively; (e), (f), and (g) are confusion matrices corresponding to the identical set of input combinations.
...and 3 more figures

GLEAM: A Multimodal Imaging Dataset and HAMM for Glaucoma Classification

Abstract

GLEAM: A Multimodal Imaging Dataset and HAMM for Glaucoma Classification

Authors

Abstract

Table of Contents

Figures (8)