Table of Contents
Fetching ...

Uncertainty Quantification for Clinical Outcome Predictions with (Large) Language Models

Zizhang Chen, Peizhao Li, Xiaomeng Dong, Pengyu Hong

TL;DR

It is shown that an effective reduction of model uncertainty can be achieved by using the proposed multi-tasking and ensemble methods in EHRs, and this approach is extended to black-box settings, including popular proprietary LMs such as GPT-4.

Abstract

To facilitate healthcare delivery, language models (LMs) have significant potential for clinical prediction tasks using electronic health records (EHRs). However, in these high-stakes applications, unreliable decisions can result in high costs due to compromised patient safety and ethical concerns, thus increasing the need for good uncertainty modeling of automated clinical predictions. To address this, we consider the uncertainty quantification of LMs for EHR tasks in white- and black-box settings. We first quantify uncertainty in white-box models, where we can access model parameters and output logits. We show that an effective reduction of model uncertainty can be achieved by using the proposed multi-tasking and ensemble methods in EHRs. Continuing with this idea, we extend our approach to black-box settings, including popular proprietary LMs such as GPT-4. We validate our framework using longitudinal clinical data from more than 6,000 patients in ten clinical prediction tasks. Results show that ensembling methods and multi-task prediction prompts reduce uncertainty across different scenarios. These findings increase the transparency of the model in white-box and black-box settings, thus advancing reliable AI healthcare.

Uncertainty Quantification for Clinical Outcome Predictions with (Large) Language Models

TL;DR

It is shown that an effective reduction of model uncertainty can be achieved by using the proposed multi-tasking and ensemble methods in EHRs, and this approach is extended to black-box settings, including popular proprietary LMs such as GPT-4.

Abstract

To facilitate healthcare delivery, language models (LMs) have significant potential for clinical prediction tasks using electronic health records (EHRs). However, in these high-stakes applications, unreliable decisions can result in high costs due to compromised patient safety and ethical concerns, thus increasing the need for good uncertainty modeling of automated clinical predictions. To address this, we consider the uncertainty quantification of LMs for EHR tasks in white- and black-box settings. We first quantify uncertainty in white-box models, where we can access model parameters and output logits. We show that an effective reduction of model uncertainty can be achieved by using the proposed multi-tasking and ensemble methods in EHRs. Continuing with this idea, we extend our approach to black-box settings, including popular proprietary LMs such as GPT-4. We validate our framework using longitudinal clinical data from more than 6,000 patients in ten clinical prediction tasks. Results show that ensembling methods and multi-task prediction prompts reduce uncertainty across different scenarios. These findings increase the transparency of the model in white-box and black-box settings, thus advancing reliable AI healthcare.

Paper Structure

This paper contains 24 sections, 6 equations, 1 figure, 5 tables.

Figures (1)

  • Figure 1: EHR predictions with medical codes sequences. Left: structured, longitudinal medical tokens, each code is in OMOP format reich2024ohdsi and associated with a specific time point. We translate these codes into natural languages that describe a patient's timeline. Right: The interpreted EHR data can be used for severing various clinical applications such as Long Length of Stay or Hypoglycemia predictions.