Table of Contents
Fetching ...

LLMs for clinical risk prediction

Mohamed Rezk, Patricia Cabanillas Silva, Fried-Michael Dahlweid

TL;DR

Findings indicate that GPT-4 exhibited significant deficiencies in identifying positive cases and struggled to provide reliable probability estimates for delirium risk, while clinalytix Medical AI demonstrated superior accuracy.

Abstract

This study compares the efficacy of GPT-4 and clinalytix Medical AI in predicting the clinical risk of delirium development. Findings indicate that GPT-4 exhibited significant deficiencies in identifying positive cases and struggled to provide reliable probability estimates for delirium risk, while clinalytix Medical AI demonstrated superior accuracy. A thorough analysis of the large language model's (LLM) outputs elucidated potential causes for these discrepancies, consistent with limitations reported in extant literature. These results underscore the challenges LLMs face in accurately diagnosing conditions and interpreting complex clinical data. While LLMs hold substantial potential in healthcare, they are currently unsuitable for independent clinical decision-making. Instead, they should be employed in assistive roles, complementing clinical expertise. Continued human oversight remains essential to ensure optimal outcomes for both patients and healthcare providers.

LLMs for clinical risk prediction

TL;DR

Findings indicate that GPT-4 exhibited significant deficiencies in identifying positive cases and struggled to provide reliable probability estimates for delirium risk, while clinalytix Medical AI demonstrated superior accuracy.

Abstract

This study compares the efficacy of GPT-4 and clinalytix Medical AI in predicting the clinical risk of delirium development. Findings indicate that GPT-4 exhibited significant deficiencies in identifying positive cases and struggled to provide reliable probability estimates for delirium risk, while clinalytix Medical AI demonstrated superior accuracy. A thorough analysis of the large language model's (LLM) outputs elucidated potential causes for these discrepancies, consistent with limitations reported in extant literature. These results underscore the challenges LLMs face in accurately diagnosing conditions and interpreting complex clinical data. While LLMs hold substantial potential in healthcare, they are currently unsuitable for independent clinical decision-making. Instead, they should be employed in assistive roles, complementing clinical expertise. Continued human oversight remains essential to ensure optimal outcomes for both patients and healthcare providers.
Paper Structure (8 sections, 1 figure, 1 table)

This paper contains 8 sections, 1 figure, 1 table.

Figures (1)

  • Figure 1: Confusion Matrix Comparison of GPT-4 and clinalytix Medical AI in Predicting Delirium Risk. The figure illustrates the confusion matrices for both models, showing the distribution of true positives, true negatives, false positives, and false negatives. The comparison highlights the differences in the models' ability to accurately predict the clinical risk of delirium.