Table of Contents
Fetching ...

On Using Large Language Models to Enhance Clinically-Driven Missing Data Recovery Algorithms in Electronic Health Records

Sarah C. Lotspeich, Abbey Collins, Brian J. Wells, Ashish K. Khanna, Joseph Rigdon, Lucy D'Agostino McGowan

TL;DR

Clinically-driven algorithms (enhanced by LLM) can recover missing EHR data with similar accuracy to chart reviews and can feasibly be applied to large samples and extending them to monitor other dimensions of data quality is a promising future direction.

Abstract

Objective: Electronic health records (EHR) data are prone to missingness and errors. Previously, we devised an "enriched" chart review protocol where a "roadmap" of auxiliary diagnoses (anchors) was used to recover missing values in EHR data (e.g., a diagnosis of impaired glycemic control might imply that a missing hemoglobin A1c value would be considered unhealthy). Still, chart reviews are expensive and time-intensive, which limits the number of patients whose data can be reviewed. Now, we investigate the accuracy and scalability of a roadmap-driven algorithm, based on ICD-10 codes (International Classification of Diseases, 10th revision), to mimic expert chart reviews and recover missing values. Materials and Methods: In addition to the clinicians' original roadmap from our previous work, we consider new versions that were iteratively refined using large language models (LLM) in conjunction with clinical expertise to expand the list of auxiliary diagnoses. Using chart reviews for 100 patients from the EHR at an extensive learning health system, we examine algorithm performance with different roadmaps. Using the larger study of $1000$ patients, we applied the final algorithm, which used a roadmap with clinician-approved additions from the LLM. Results: The algorithm recovered as much, if not more, missing data as the expert chart reviewers, depending on the roadmap. Discussion: Clinically-driven algorithms (enhanced by LLM) can recover missing EHR data with similar accuracy to chart reviews and can feasibly be applied to large samples. Extending them to monitor other dimensions of data quality (e.g., plausability) is a promising future direction.

On Using Large Language Models to Enhance Clinically-Driven Missing Data Recovery Algorithms in Electronic Health Records

TL;DR

Clinically-driven algorithms (enhanced by LLM) can recover missing EHR data with similar accuracy to chart reviews and can feasibly be applied to large samples and extending them to monitor other dimensions of data quality is a promising future direction.

Abstract

Objective: Electronic health records (EHR) data are prone to missingness and errors. Previously, we devised an "enriched" chart review protocol where a "roadmap" of auxiliary diagnoses (anchors) was used to recover missing values in EHR data (e.g., a diagnosis of impaired glycemic control might imply that a missing hemoglobin A1c value would be considered unhealthy). Still, chart reviews are expensive and time-intensive, which limits the number of patients whose data can be reviewed. Now, we investigate the accuracy and scalability of a roadmap-driven algorithm, based on ICD-10 codes (International Classification of Diseases, 10th revision), to mimic expert chart reviews and recover missing values. Materials and Methods: In addition to the clinicians' original roadmap from our previous work, we consider new versions that were iteratively refined using large language models (LLM) in conjunction with clinical expertise to expand the list of auxiliary diagnoses. Using chart reviews for 100 patients from the EHR at an extensive learning health system, we examine algorithm performance with different roadmaps. Using the larger study of patients, we applied the final algorithm, which used a roadmap with clinician-approved additions from the LLM. Results: The algorithm recovered as much, if not more, missing data as the expert chart reviewers, depending on the roadmap. Discussion: Clinically-driven algorithms (enhanced by LLM) can recover missing EHR data with similar accuracy to chart reviews and can feasibly be applied to large samples. Extending them to monitor other dimensions of data quality (e.g., plausability) is a promising future direction.

Paper Structure

This paper contains 21 sections, 5 figures, 3 tables.

Figures (5)

  • Figure 1: Flow charts of the $1000$ data points included in the expert chart reviews ($10$ components per patient across $n = 100$ patients). We compare the extracted electronic health records (EHR) data to the A) expert chart reviews and the missing data recovery algorithms based on the following roadmaps: B) clinicians' original, C) LLM (baseline), D) LLM (context), and E) LLM (context + clinicians). Shaded boxes denote data points that were missing in the EHR data; these are the only data points that could potentially be changed by the missing data recovery algorithms in B--E, and a missing data point could only be changed to "unhealthy."
  • Figure 2: Counts of missing allostatic load index components A) per component and B) per patient across those chosen for expert chart reviews ($n = 100$ patients). Missingness should be highest according to the unvalidated electronic health records (EHR) data, while the chart review might reduce it. The different missing recovery algorithms reduced the amount of missingness by varying amounts, depending on the roadmap: LLM (baseline), clinicians' original, LLM (context + clinicians), or LLM (context).
  • Figure 3: Counts of missing allostatic load index components A) per patient and B) per component across the full sample ($N = 1000$ patients). Missingness should be highest according to the unvalidated electronic health records (EHR) data, while the chart review might reduce it. We only considered algorithmic recovery based on the LLM (context + clinicians) roadmap, which could be applied across the full sample and sometimes recovered more missing data than the expert chart reviews.
  • Figure 4: Many patients' expert chart reviewed and algorithmically augmented allostatic load indices (ALIs) differed from the version in their extracted electronic health records (EHR) data. Still, the loess smoothers (solid lines) capturing the relationship between the validated/augmented and unvalidated ALIs were relatively linear and fell close to the line of equality (i.e., where unvalidated and validated/augmented ALI were equal).
  • Figure 5: Coefficient estimates (95% confidence intervals) using the extracted electronic health records (EHR) data (naive analysis), combined EHR data + expert chart reviews (previous study estimates),lotspeich2025overcomingdatachallengesenriched and the missing data recovery algorithm using the LLM (context + clinicians) roadmap (augmented analysis).