Table of Contents
Fetching ...

MultiEYE: Dataset and Benchmark for OCT-Enhanced Retinal Disease Recognition from Fundus Images

Lehan Wang, Chongchong Qi, Chubin Ou, Lin An, Mei Jin, Xiangbin Kong, Xiaomeng Li

TL;DR

This work introduces the OCT-CoDA framework for OCT-enhanced retinal disease recognition from fundus images using unpaired multi-modal data. It leverages LLM-generated disease concepts to ground cross-modal knowledge via a concept-decoupled classifier and two distillation losses (global prototypical and local contrastive) from an OCT teacher to a fundus student. A large-scale MultiEYE dataset of unpaired fundus and OCT images across nine diseases supports evaluation, showing consistent performance gains and interpretability over single-modal baselines and existing cross-modal methods. The approach promises practical clinical impact by enabling improved fundus-based diagnosis without requiring paired patient data, and the dataset/code are publicly available.

Abstract

Existing multi-modal learning methods on fundus and OCT images mostly require both modalities to be available and strictly paired for training and testing, which appears less practical in clinical scenarios. To expand the scope of clinical applications, we formulate a novel setting, "OCT-enhanced disease recognition from fundus images", that allows for the use of unpaired multi-modal data during the training phase and relies on the widespread fundus photographs for testing. To benchmark this setting, we present the first large multi-modal multi-class dataset for eye disease diagnosis, MultiEYE, and propose an OCT-assisted Conceptual Distillation Approach (OCT-CoDA), which employs semantically rich concepts to extract disease-related knowledge from OCT images and leverage them into the fundus model. Specifically, we regard the image-concept relation as a link to distill useful knowledge from the OCT teacher model to the fundus student model, which considerably improves the diagnostic performance based on fundus images and formulates the cross-modal knowledge transfer into an explainable process. Through extensive experiments on the multi-disease classification task, our proposed OCT-CoDA demonstrates remarkable results and interpretability, showing great potential for clinical application. Our dataset and code are available at https://github.com/xmed-lab/MultiEYE.

MultiEYE: Dataset and Benchmark for OCT-Enhanced Retinal Disease Recognition from Fundus Images

TL;DR

This work introduces the OCT-CoDA framework for OCT-enhanced retinal disease recognition from fundus images using unpaired multi-modal data. It leverages LLM-generated disease concepts to ground cross-modal knowledge via a concept-decoupled classifier and two distillation losses (global prototypical and local contrastive) from an OCT teacher to a fundus student. A large-scale MultiEYE dataset of unpaired fundus and OCT images across nine diseases supports evaluation, showing consistent performance gains and interpretability over single-modal baselines and existing cross-modal methods. The approach promises practical clinical impact by enabling improved fundus-based diagnosis without requiring paired patient data, and the dataset/code are publicly available.

Abstract

Existing multi-modal learning methods on fundus and OCT images mostly require both modalities to be available and strictly paired for training and testing, which appears less practical in clinical scenarios. To expand the scope of clinical applications, we formulate a novel setting, "OCT-enhanced disease recognition from fundus images", that allows for the use of unpaired multi-modal data during the training phase and relies on the widespread fundus photographs for testing. To benchmark this setting, we present the first large multi-modal multi-class dataset for eye disease diagnosis, MultiEYE, and propose an OCT-assisted Conceptual Distillation Approach (OCT-CoDA), which employs semantically rich concepts to extract disease-related knowledge from OCT images and leverage them into the fundus model. Specifically, we regard the image-concept relation as a link to distill useful knowledge from the OCT teacher model to the fundus student model, which considerably improves the diagnostic performance based on fundus images and formulates the cross-modal knowledge transfer into an explainable process. Through extensive experiments on the multi-disease classification task, our proposed OCT-CoDA demonstrates remarkable results and interpretability, showing great potential for clinical application. Our dataset and code are available at https://github.com/xmed-lab/MultiEYE.

Paper Structure

This paper contains 28 sections, 9 equations, 8 figures, 9 tables.

Figures (8)

  • Figure 1: The comparison between multi-modal learning, FDDM wang2023fundus and our method. (a) Multi-modal learning requires paired data for both training and test stage. (b) FDDM and (c) our method similarly apply unpaired data for training and only utilize one modality for testing. In our method, we target at the fundus model and integrate concepts as additional guidance to help transfer teacher-dominant features to student model.
  • Figure 2: The Framework of the Proposed OCT-CoDA Method. The pre-trained OCT model is adopted as the teacher to train the fundus student model. Given a batch of unpaired OCT images and fundus photos, they are fed into separate image encoders to get the extracted features. To implement the conceptual distillation, we first prompt the LLM to generate a concept pool. Secondly, we compute the similarity between image features and concept embeddings for each modality. Finally, the OCT-assisted distillation is performed based on the image-concept similarity. In the inference stage, we input this similarity matrix into a Fully Connected (FC) layer to obtain the prediction score.
  • Figure 3: Examples of the concept generation process with LLM. (a) The typical prompt is used to trigger the LLM. (b) We apply our refined CoT prompt for concept generation.
  • Figure 4: Examples of the MultiEYE dataset. We select one sample for each category in each sub-dataset.
  • Figure 5: Analysis of the loss weight sensitivity on the MultiEYE dataset. We draw the curve of the Precision-Recall F1 score.
  • ...and 3 more figures