Table of Contents
Fetching ...

Augmented Risk Prediction for the Onset of Alzheimer's Disease from Electronic Health Records with Large Language Models

Jiankun Wang, Sumyeong Ahn, Taykhoom Dalal, Xiaodan Zhang, Weishen Pan, Qiannan Zhang, Bin Chen, Hiroko H. Dodge, Fei Wang, Jiayu Zhou

TL;DR

This work tackles ADRD risk prediction from electronic health records by proposing a collaborative pipeline that marries supervised learning with large language models through a confidence-guided routing mechanism. The method summarizes tabular EHR data into natural language, trains SLs on the data, and uses SLs for confident cases while employing LLMs with in-context learning for uncertain cases, with ICL demonstrations drawn from a reliable subset. Key contributions include a detailed data construction pipeline from the OHSU EHR warehouse, a principled confidence-driven decision rule using $\sigma$, and extensive ablations demonstrating the benefits of EHR summarization, similarity-based demonstration retrieval, and denoising strategies, along with insights that larger or medical-fine-tuned models do not uniformly improve performance. Empirical results on six CP_PW configurations show improved F1 scores over baselines, suggesting practical value for early ADRD screening and patient management, though overall performance remains challenging due to missing clinical notes and demographic data. The work highlights the potential of combining SLs and LLMs for healthcare screening applications and motivates further exploration of model selection, representation of structured medical data for reasoning, and scalable ICL strategies in clinical domains.

Abstract

Alzheimer's disease (AD) is the fifth-leading cause of death among Americans aged 65 and older. Screening and early detection of AD and related dementias (ADRD) are critical for timely intervention and for identifying clinical trial participants. The widespread adoption of electronic health records (EHRs) offers an important resource for developing ADRD screening tools such as machine learning based predictive models. Recent advancements in large language models (LLMs) demonstrate their unprecedented capability of encoding knowledge and performing reasoning, which offers them strong potential for enhancing risk prediction. This paper proposes a novel pipeline that augments risk prediction by leveraging the few-shot inference power of LLMs to make predictions on cases where traditional supervised learning methods (SLs) may not excel. Specifically, we develop a collaborative pipeline that combines SLs and LLMs via a confidence-driven decision-making mechanism, leveraging the strengths of SLs in clear-cut cases and LLMs in more complex scenarios. We evaluate this pipeline using a real-world EHR data warehouse from Oregon Health \& Science University (OHSU) Hospital, encompassing EHRs from over 2.5 million patients and more than 20 million patient encounters. Our results show that our proposed approach effectively combines the power of SLs and LLMs, offering significant improvements in predictive performance. This advancement holds promise for revolutionizing ADRD screening and early detection practices, with potential implications for better strategies of patient management and thus improving healthcare.

Augmented Risk Prediction for the Onset of Alzheimer's Disease from Electronic Health Records with Large Language Models

TL;DR

This work tackles ADRD risk prediction from electronic health records by proposing a collaborative pipeline that marries supervised learning with large language models through a confidence-guided routing mechanism. The method summarizes tabular EHR data into natural language, trains SLs on the data, and uses SLs for confident cases while employing LLMs with in-context learning for uncertain cases, with ICL demonstrations drawn from a reliable subset. Key contributions include a detailed data construction pipeline from the OHSU EHR warehouse, a principled confidence-driven decision rule using , and extensive ablations demonstrating the benefits of EHR summarization, similarity-based demonstration retrieval, and denoising strategies, along with insights that larger or medical-fine-tuned models do not uniformly improve performance. Empirical results on six CP_PW configurations show improved F1 scores over baselines, suggesting practical value for early ADRD screening and patient management, though overall performance remains challenging due to missing clinical notes and demographic data. The work highlights the potential of combining SLs and LLMs for healthcare screening applications and motivates further exploration of model selection, representation of structured medical data for reasoning, and scalable ICL strategies in clinical domains.

Abstract

Alzheimer's disease (AD) is the fifth-leading cause of death among Americans aged 65 and older. Screening and early detection of AD and related dementias (ADRD) are critical for timely intervention and for identifying clinical trial participants. The widespread adoption of electronic health records (EHRs) offers an important resource for developing ADRD screening tools such as machine learning based predictive models. Recent advancements in large language models (LLMs) demonstrate their unprecedented capability of encoding knowledge and performing reasoning, which offers them strong potential for enhancing risk prediction. This paper proposes a novel pipeline that augments risk prediction by leveraging the few-shot inference power of LLMs to make predictions on cases where traditional supervised learning methods (SLs) may not excel. Specifically, we develop a collaborative pipeline that combines SLs and LLMs via a confidence-driven decision-making mechanism, leveraging the strengths of SLs in clear-cut cases and LLMs in more complex scenarios. We evaluate this pipeline using a real-world EHR data warehouse from Oregon Health \& Science University (OHSU) Hospital, encompassing EHRs from over 2.5 million patients and more than 20 million patient encounters. Our results show that our proposed approach effectively combines the power of SLs and LLMs, offering significant improvements in predictive performance. This advancement holds promise for revolutionizing ADRD screening and early detection practices, with potential implications for better strategies of patient management and thus improving healthcare.
Paper Structure (30 sections, 7 equations, 5 figures, 9 tables, 2 algorithms)

This paper contains 30 sections, 7 equations, 5 figures, 9 tables, 2 algorithms.

Figures (5)

  • Figure 1: Summary of constructing control samples for each case: Among patients, we select a subset that has a time gap of within one year compared to the case, and also has an encounter within six months after the case is diagnosed with ADRD. The observation period is the duration between the first encounter date and the start of the prediction window, which is selected from among {0, 1, 3} years. In short, a longer prediction window implies predicting a further future possibility of ADRD.
  • Figure 2: Framework Overview. We combine the power of supervised learning methods and advanced Large Language Models (LLMs) to build risk prediction for the onset of AD/ADRD from Electronic Health Records (EHRs). We first use LLM to summarize the tabular data in EHRs. Then, we train SLs (e.g., LR, XGB, MLP), and split the confidently/unconfidently predicted samples. Finally, we perform in-context sample selection for the unconfident samples and predict them using LLMs.
  • Figure 3: Analysis of different confidence thresholds used in our pipeline on CP{1, 2}/PW{0, 1, 3}.
  • Figure 4: Comparison of performance across different model sizes and fine-tuning on specific medical datasets.
  • Figure S1: Entire pipeline to summarize the tabular format EHR. We first create a sentence format of each patient's information, which is represented in a row, by concatenating them. After that, we leverage LLMs' summarization capability by feeding the concatenated sentence with a guiding prompt for the summarization task.