EHRmonize: A Framework for Medical Concept Abstraction from Electronic Health Records using Large Language Models

João Matos; Jack Gallifant; Jian Pei; A. Ian Wong

EHRmonize: A Framework for Medical Concept Abstraction from Electronic Health Records using Large Language Models

João Matos, Jack Gallifant, Jian Pei, A. Ian Wong

TL;DR

This paper introduces EHRmonize, a framework that uses large language models to automatically abstract medical concepts from electronic health records, addressing the challenges of heterogeneous EHR terminology and coding systems. It combines SQL-based corpus generation from real-world datasets (MIMIC-IV and eICU-CRD) with few-shot prompting across multiple LLMs to perform two free-text extraction tasks and six binary classification tasks, focusing on medication data. The approach achieves strong performance, notably GPT-4o with 10-shot prompting yielding perfect antibiotic classification and near-perfect route identification, while delivering substantial annotation-time savings (~60–67%) and maintaining clinician oversight. These results suggest that LLM-driven EHR data harmonization can accelerate data preparation for healthcare research, and the open-source implementation lowers barriers for broader adoption and further development, albeit with limitations in dataset size and domain coverage.

Abstract

Electronic health records (EHRs) contain vast amounts of complex data, but harmonizing and processing this information remains a challenging and costly task requiring significant clinical expertise. While large language models (LLMs) have shown promise in various healthcare applications, their potential for abstracting medical concepts from EHRs remains largely unexplored. We introduce EHRmonize, a framework leveraging LLMs to abstract medical concepts from EHR data. Our study uses medication data from two real-world EHR databases to evaluate five LLMs on two free-text extraction and six binary classification tasks across various prompting strategies. GPT-4o's with 10-shot prompting achieved the highest performance in all tasks, accompanied by Claude-3.5-Sonnet in a subset of tasks. GPT-4o achieved an accuracy of 97% in identifying generic route names, 82% for generic drug names, and 100% in performing binary classification of antibiotics. While EHRmonize significantly enhances efficiency, reducing annotation time by an estimated 60%, we emphasize that clinician oversight remains essential. Our framework, available as a Python package, offers a promising tool to assist clinicians in EHR data abstraction, potentially accelerating healthcare research and improving data harmonization processes.

EHRmonize: A Framework for Medical Concept Abstraction from Electronic Health Records using Large Language Models

TL;DR

Abstract

EHRmonize: A Framework for Medical Concept Abstraction from Electronic Health Records using Large Language Models

Authors

TL;DR

Abstract

Table of Contents

Figures (5)