EyecareGPT: Boosting Comprehensive Ophthalmology Understanding with Tailored Dataset, Benchmark and Model
Sijing Li, Tianwei Lin, Lingshuai Lin, Wenqiao Zhang, Jiang Liu, Xiaoda Yang, Juncheng Li, Yucheng He, Xiaohui Song, Jun Xiao, Yueting Zhuang, Beng Chin Ooi
TL;DR
The paper introduces Eyecare Kit to address three core gaps in ophthalmology AI: data quality, benchmarking, and model architecture. It provides Eyecare-100K, a large, multi-modal ophthalmic visual-instruction dataset; Eyecare-Bench, a multi-task benchmark with EyeEval-based metrics and GPT-4 facilitated evaluation; and EyecareGPT, a high-resolution LVLM with a Layer-wise Dense Connector and Adaptive Anyres Mechanism. Empirical results show EyecareGPT attains state-of-the-art performance on closed QA, open QA, and report generation across multiple modalities, significantly outperforming baseline LVLMs, and benefiting greatly from fine-tuning on Eyecare-100K. The work emphasizes the value of domain-specific data and evaluation in advancing open research for intelligent ophthalmic diagnosis and provides openly available resources for the community.
Abstract
Medical Large Vision-Language Models (Med-LVLMs) demonstrate significant potential in healthcare, but their reliance on general medical data and coarse-grained global visual understanding limits them in intelligent ophthalmic diagnosis. Currently, intelligent ophthalmic diagnosis faces three major challenges: (i) Data. The lack of deeply annotated, high-quality, multi-modal ophthalmic visual instruction data; (ii) Benchmark. The absence of a comprehensive and systematic benchmark for evaluating diagnostic performance; (iii) Model. The difficulty of adapting holistic visual architectures to fine-grained, region-specific ophthalmic lesion identification. In this paper, we propose the Eyecare Kit, which systematically tackles the aforementioned three key challenges with the tailored dataset, benchmark and model: First, we construct a multi-agent data engine with real-life ophthalmology data to produce Eyecare-100K, a high-quality ophthalmic visual instruction dataset. Subsequently, we design Eyecare-Bench, a benchmark that comprehensively evaluates the overall performance of LVLMs on intelligent ophthalmic diagnosis tasks across multiple dimensions. Finally, we develop the EyecareGPT, optimized for fine-grained ophthalmic visual understanding thoroughly, which incorporates an adaptive resolution mechanism and a layer-wise dense connector. Extensive experimental results indicate that the EyecareGPT achieves state-of-the-art performance in a range of ophthalmic tasks, underscoring its significant potential for the advancement of open research in intelligent ophthalmic diagnosis. Our project is available at https://github.com/DCDmllm/EyecareGPT.
