Table of Contents
Fetching ...

Insight: A Multi-Modal Diagnostic Pipeline using LLMs for Ocular Surface Disease Diagnosis

Chun-Hsiao Yeh, Jiayun Wang, Andrew D. Graham, Andrea J. Liu, Bo Tan, Yubei Chen, Yi Ma, Meng C. Lin

TL;DR

An innovative multi-modal diagnostic pipeline is introduced by employing large language models (LLMs) by employing large language models (LLMs) for ocular surface disease diagnosis and demonstrates that MDPipe outperforms existing standards, including GPT-4, and provides clinically sound rationales for diagnoses.

Abstract

Accurate diagnosis of ocular surface diseases is critical in optometry and ophthalmology, which hinge on integrating clinical data sources (e.g., meibography imaging and clinical metadata). Traditional human assessments lack precision in quantifying clinical observations, while current machine-based methods often treat diagnoses as multi-class classification problems, limiting the diagnoses to a predefined closed-set of curated answers without reasoning the clinical relevance of each variable to the diagnosis. To tackle these challenges, we introduce an innovative multi-modal diagnostic pipeline (MDPipe) by employing large language models (LLMs) for ocular surface disease diagnosis. We first employ a visual translator to interpret meibography images by converting them into quantifiable morphology data, facilitating their integration with clinical metadata and enabling the communication of nuanced medical insight to LLMs. To further advance this communication, we introduce a LLM-based summarizer to contextualize the insight from the combined morphology and clinical metadata, and generate clinical report summaries. Finally, we refine the LLMs' reasoning ability with domain-specific insight from real-life clinician diagnoses. Our evaluation across diverse ocular surface disease diagnosis benchmarks demonstrates that MDPipe outperforms existing standards, including GPT-4, and provides clinically sound rationales for diagnoses.

Insight: A Multi-Modal Diagnostic Pipeline using LLMs for Ocular Surface Disease Diagnosis

TL;DR

An innovative multi-modal diagnostic pipeline is introduced by employing large language models (LLMs) by employing large language models (LLMs) for ocular surface disease diagnosis and demonstrates that MDPipe outperforms existing standards, including GPT-4, and provides clinically sound rationales for diagnoses.

Abstract

Accurate diagnosis of ocular surface diseases is critical in optometry and ophthalmology, which hinge on integrating clinical data sources (e.g., meibography imaging and clinical metadata). Traditional human assessments lack precision in quantifying clinical observations, while current machine-based methods often treat diagnoses as multi-class classification problems, limiting the diagnoses to a predefined closed-set of curated answers without reasoning the clinical relevance of each variable to the diagnosis. To tackle these challenges, we introduce an innovative multi-modal diagnostic pipeline (MDPipe) by employing large language models (LLMs) for ocular surface disease diagnosis. We first employ a visual translator to interpret meibography images by converting them into quantifiable morphology data, facilitating their integration with clinical metadata and enabling the communication of nuanced medical insight to LLMs. To further advance this communication, we introduce a LLM-based summarizer to contextualize the insight from the combined morphology and clinical metadata, and generate clinical report summaries. Finally, we refine the LLMs' reasoning ability with domain-specific insight from real-life clinician diagnoses. Our evaluation across diverse ocular surface disease diagnosis benchmarks demonstrates that MDPipe outperforms existing standards, including GPT-4, and provides clinically sound rationales for diagnoses.
Paper Structure (25 sections, 4 figures, 2 tables)

This paper contains 25 sections, 4 figures, 2 tables.

Figures (4)

  • Figure 1: Multi-modal diagnostic pipeline using LLMs for OSD diagnosis. The proposed pipeline utilizes 1) a visual translator to transform meibography images into quantifiable MG morphology, 2) an LLM-based summarizer to craft clinical reports, and 3) the integration of clinical knowledge to augment LLM's capability in diagnosing OSD.
  • Figure 2: (a) Illustration of the limitations of current MLLMs in processing visual data, including: 1) producing vague interpretations; 2) not delivering clinical significance for ocular surface diseases, such as labeling "a black spot in the eye". (b) Our visual translator $\mathcal{V}$ is designed to interpret visual data $\mathbf{I}$ by converting them into quantifiable MG morphology data.
  • Figure 3: We employed an LLM-based summarizer to generate Q&A clinical reports to contextualize insights from both the non-narrative clinical metadata and MG morphology to enhance LLMs' learning capability.
  • Figure 4: Comparative evaluation and clinician study between MDPipe and GPT-4.