Table of Contents
Fetching ...

OphGLM: Training an Ophthalmology Large Language-and-Vision Assistant based on Instructions and Dialogue

Weihao Gao, Zhuo Deng, Zhiyuan Niu, Fuju Rong, Chucheng Chen, Zheng Gong, Wenze Zhang, Daimin Xiao, Fang Li, Zhenjie Cao, Zhaoyi Ma, Wenbin Wei, Lan Ma

TL;DR

The paper tackles the challenge of applying large language–vision models to ophthalmology by building OphGLM, a fundus-image–driven ophthalmic LLM-Vision assistant. It constructs a dual data stream—knowledge-graph–based instructions and real-world doctor–patient dialogues—to fine-tune an ophthalmology dialogue-capable LLM (ChatGLM) and a fundus diagnosis pipeline for disease classification and lesion segmentation. The system merges structured diagnostic reports with dialogues to generate clinically relevant responses, achieving strong performance on multiple tasks and promising real-world clinical utility. The authors also plan to release data, code, and models to enable broader research and adoption, and to extend the approach to additional imaging modalities such as OCT.

Abstract

Large multimodal language models (LMMs) have achieved significant success in general domains. However, due to the significant differences between medical images and text and general web content, the performance of LMMs in medical scenarios is limited. In ophthalmology, clinical diagnosis relies on multiple modalities of medical images, but unfortunately, multimodal ophthalmic large language models have not been explored to date. In this paper, we study and construct an ophthalmic large multimodal model. Firstly, we use fundus images as an entry point to build a disease assessment and diagnosis pipeline to achieve common ophthalmic disease diagnosis and lesion segmentation. Then, we establish a new ophthalmic multimodal instruction-following and dialogue fine-tuning dataset based on disease-related knowledge data and publicly available real-world medical dialogue. We introduce visual ability into the large language model to complete the ophthalmic large language and vision assistant (OphGLM). Our experimental results demonstrate that the OphGLM model performs exceptionally well, and it has the potential to revolutionize clinical applications in ophthalmology. The dataset, code, and models will be made publicly available at https://github.com/ML-AILab/OphGLM.

OphGLM: Training an Ophthalmology Large Language-and-Vision Assistant based on Instructions and Dialogue

TL;DR

The paper tackles the challenge of applying large language–vision models to ophthalmology by building OphGLM, a fundus-image–driven ophthalmic LLM-Vision assistant. It constructs a dual data stream—knowledge-graph–based instructions and real-world doctor–patient dialogues—to fine-tune an ophthalmology dialogue-capable LLM (ChatGLM) and a fundus diagnosis pipeline for disease classification and lesion segmentation. The system merges structured diagnostic reports with dialogues to generate clinically relevant responses, achieving strong performance on multiple tasks and promising real-world clinical utility. The authors also plan to release data, code, and models to enable broader research and adoption, and to extend the approach to additional imaging modalities such as OCT.

Abstract

Large multimodal language models (LMMs) have achieved significant success in general domains. However, due to the significant differences between medical images and text and general web content, the performance of LMMs in medical scenarios is limited. In ophthalmology, clinical diagnosis relies on multiple modalities of medical images, but unfortunately, multimodal ophthalmic large language models have not been explored to date. In this paper, we study and construct an ophthalmic large multimodal model. Firstly, we use fundus images as an entry point to build a disease assessment and diagnosis pipeline to achieve common ophthalmic disease diagnosis and lesion segmentation. Then, we establish a new ophthalmic multimodal instruction-following and dialogue fine-tuning dataset based on disease-related knowledge data and publicly available real-world medical dialogue. We introduce visual ability into the large language model to complete the ophthalmic large language and vision assistant (OphGLM). Our experimental results demonstrate that the OphGLM model performs exceptionally well, and it has the potential to revolutionize clinical applications in ophthalmology. The dataset, code, and models will be made publicly available at https://github.com/ML-AILab/OphGLM.
Paper Structure (17 sections, 2 equations, 6 figures, 4 tables)

This paper contains 17 sections, 2 equations, 6 figures, 4 tables.

Figures (6)

  • Figure 1: Instruction fine-tuning datasets based on five different scenarios.
  • Figure 2: The process of building the fine-tuned Fundus dialog dataset.
  • Figure 3: Overall architecture of the proposed OphGLM. (a) Fundus diagnosis pipeline. (b) OphGLM fine-tuning pipeline.
  • Figure 4: Knowledge Based Prompt.
  • Figure 5: Conversation Based Prompt.
  • ...and 1 more figures