Large Language Models for Drug Overdose Prediction from Longitudinal Medical Records

Md Sultan Al Nahian; Chris Delcher; Daniel Harris; Peter Akpunonu; Ramakanth Kavuluru

Large Language Models for Drug Overdose Prediction from Longitudinal Medical Records

Md Sultan Al Nahian, Chris Delcher, Daniel Harris, Peter Akpunonu, Ramakanth Kavuluru

TL;DR

This study evaluates GPT-4o for predicting drug overdose events from longitudinal insurance claims in the Merative MarketScan dataset, framing overdose risk as a sequential prediction task across $7$- and $30$-day windows. It contrasts traditional baselines (Random Forest, XGBoost) with multiple LLM input representations, comparing zero-shot and fine-tuned GPT-4o performance. Fine-tuning on aggregated visit-level statistics yields the strongest results, with an $F1$ score of $84.53$ (recall $82$) for the $7$-day window, outperforming baselines, while zero-shot prompts demonstrate meaningful performance leveraging LLM prior knowledge. The work highlights input design, code-vs-description formats, and context length as key factors for LLM-based clinical prediction, and discusses cost and deployment considerations for integrating such models into decision-support pipelines.

Abstract

The ability to predict drug overdose risk from a patient's medical records is crucial for timely intervention and prevention. Traditional machine learning models have shown promise in analyzing longitudinal medical records for this task. However, recent advancements in large language models (LLMs) offer an opportunity to enhance prediction performance by leveraging their ability to process long textual data and their inherent prior knowledge across diverse tasks. In this study, we assess the effectiveness of Open AI's GPT-4o LLM in predicting drug overdose events using patients' longitudinal insurance claims records. We evaluate its performance in both fine-tuned and zero-shot settings, comparing them to strong traditional machine learning methods as baselines. Our results show that LLMs not only outperform traditional models in certain settings but can also predict overdose risk in a zero-shot setting without task-specific training. These findings highlight the potential of LLMs in clinical decision support, particularly for drug overdose risk prediction.

Large Language Models for Drug Overdose Prediction from Longitudinal Medical Records

TL;DR

This study evaluates GPT-4o for predicting drug overdose events from longitudinal insurance claims in the Merative MarketScan dataset, framing overdose risk as a sequential prediction task across

- and

-day windows. It contrasts traditional baselines (Random Forest, XGBoost) with multiple LLM input representations, comparing zero-shot and fine-tuned GPT-4o performance. Fine-tuning on aggregated visit-level statistics yields the strongest results, with an

score of

(recall

) for the

-day window, outperforming baselines, while zero-shot prompts demonstrate meaningful performance leveraging LLM prior knowledge. The work highlights input design, code-vs-description formats, and context length as key factors for LLM-based clinical prediction, and discusses cost and deployment considerations for integrating such models into decision-support pipelines.

Large Language Models for Drug Overdose Prediction from Longitudinal Medical Records

TL;DR

Abstract

Large Language Models for Drug Overdose Prediction from Longitudinal Medical Records

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (1)