MultiEYE: Dataset and Benchmark for OCT-Enhanced Retinal Disease Recognition from Fundus Images
Lehan Wang, Chongchong Qi, Chubin Ou, Lin An, Mei Jin, Xiangbin Kong, Xiaomeng Li
TL;DR
This work introduces the OCT-CoDA framework for OCT-enhanced retinal disease recognition from fundus images using unpaired multi-modal data. It leverages LLM-generated disease concepts to ground cross-modal knowledge via a concept-decoupled classifier and two distillation losses (global prototypical and local contrastive) from an OCT teacher to a fundus student. A large-scale MultiEYE dataset of unpaired fundus and OCT images across nine diseases supports evaluation, showing consistent performance gains and interpretability over single-modal baselines and existing cross-modal methods. The approach promises practical clinical impact by enabling improved fundus-based diagnosis without requiring paired patient data, and the dataset/code are publicly available.
Abstract
Existing multi-modal learning methods on fundus and OCT images mostly require both modalities to be available and strictly paired for training and testing, which appears less practical in clinical scenarios. To expand the scope of clinical applications, we formulate a novel setting, "OCT-enhanced disease recognition from fundus images", that allows for the use of unpaired multi-modal data during the training phase and relies on the widespread fundus photographs for testing. To benchmark this setting, we present the first large multi-modal multi-class dataset for eye disease diagnosis, MultiEYE, and propose an OCT-assisted Conceptual Distillation Approach (OCT-CoDA), which employs semantically rich concepts to extract disease-related knowledge from OCT images and leverage them into the fundus model. Specifically, we regard the image-concept relation as a link to distill useful knowledge from the OCT teacher model to the fundus student model, which considerably improves the diagnostic performance based on fundus images and formulates the cross-modal knowledge transfer into an explainable process. Through extensive experiments on the multi-disease classification task, our proposed OCT-CoDA demonstrates remarkable results and interpretability, showing great potential for clinical application. Our dataset and code are available at https://github.com/xmed-lab/MultiEYE.
