Generating Querying Code from Text for Multi-Modal Electronic Health Record
Mengliang ZHang
TL;DR
This work tackles the problem of querying electronic health records by combining tabular data with unstructured clinical text through a natural language to query code pipeline. It introduces the TQGen dataset and the TQGen-EHRQuery framework, which integrates a medical knowledge module, question template matching, and a toolset for processing long texts, along with a code execution repair loop to ensure reliable queries. Key contributions include constructing a table-text EHR query dataset from MIMIC-IV/CXR/Note, designing modular NL-to-SQL workflow, and demonstrating via experiments that modular components and larger models improve query accuracy and robustness. The approach has practical implications for accelerating clinician information retrieval and enabling more accurate, multimodal EHR queries in real-world settings.
Abstract
Electronic health records (EHR) contain extensive structured and unstructured data, including tabular information and free-text clinical notes. Querying relevant patient information often requires complex database operations, increasing the workload for clinicians. However, complex table relationships and professional terminology in EHRs limit the query accuracy. In this work, we construct a publicly available dataset, TQGen, that integrates both \textbf{T}ables and clinical \textbf{T}ext for natural language-to-query \textbf{Gen}eration. To address the challenges posed by complex medical terminology and diverse types of questions in EHRs, we propose TQGen-EHRQuery, a framework comprising a medical knowledge module and a questions template matching module. For processing medical text, we introduced the concept of a toolset, which encapsulates the text processing module as a callable tool, thereby improving processing efficiency and flexibility. We conducted extensive experiments to assess the effectiveness of our dataset and workflow, demonstrating their potential to enhance information querying in EHR systems.
