GPT-4V(ision) Unsuitable for Clinical Care and Education: A Clinician-Evaluated Assessment

Senthujan Senkaiahliyan; Augustin Toma; Jun Ma; An-Wen Chan; Andrew Ha; Kevin R. An; Hrishikesh Suresh; Barry Rubin; Bo Wang

GPT-4V(ision) Unsuitable for Clinical Care and Education: A Clinician-Evaluated Assessment

Senthujan Senkaiahliyan, Augustin Toma, Jun Ma, An-Wen Chan, Andrew Ha, Kevin R. An, Hrishikesh Suresh, Barry Rubin, Bo Wang

TL;DR

Although GPT-4V is able to identify and explain medical images, its diagnostic accuracy and clinical decision-making abilities are poor, posing risks to patient safety.

Abstract

OpenAI's large multimodal model, GPT-4V(ision), was recently developed for general image interpretation. However, less is known about its capabilities with medical image interpretation and diagnosis. Board-certified physicians and senior residents assessed GPT-4V's proficiency across a range of medical conditions using imaging modalities such as CT scans, MRIs, ECGs, and clinical photographs. Although GPT-4V is able to identify and explain medical images, its diagnostic accuracy and clinical decision-making abilities are poor, posing risks to patient safety. Despite the potential that large language models may have in enhancing medical education and delivery, the current limitations of GPT-4V in interpreting medical images reinforces the importance of appropriate caution when using it for clinical decision-making.

GPT-4V(ision) Unsuitable for Clinical Care and Education: A Clinician-Evaluated Assessment

TL;DR

Although GPT-4V is able to identify and explain medical images, its diagnostic accuracy and clinical decision-making abilities are poor, posing risks to patient safety.

Abstract

Paper Structure (22 sections, 5 figures, 4 tables)

This paper contains 22 sections, 5 figures, 4 tables.

1. Introducing GPT-4V(ision)
2. Data Collection
3. Experimental Setup
4. Results
5. Discussion and Limitations
Supplementary Notes

Figures (5)

Figure 1: Evaluation platform to collect clinician feedback on GPT4V's output.
Figure 2: Evaluation of GPT-4V's Interpretations on Medical Images with Expert Feedback
Figure 3: Case Study 1- MRI
Figure 4: Case Study 2- CT
Figure 5: Case Study 3- ECG

GPT-4V(ision) Unsuitable for Clinical Care and Education: A Clinician-Evaluated Assessment

TL;DR

Abstract

GPT-4V(ision) Unsuitable for Clinical Care and Education: A Clinician-Evaluated Assessment

Authors

TL;DR

Abstract

Table of Contents

Figures (5)