Table of Contents
Fetching ...

Dual-stage and Lightweight Patient Chart Summarization for Emergency Physicians

Jiajun Wu, Swaleh Zaidi, Braden Teitge, Henry Leung, Jiayu Zhou, Jessalyn Holodinsky, Steve Drew

TL;DR

The paper tackles the need for fast, privacy-preserving access to salient patient information in emergency departments by proposing a fully offline, dual-device EHR summarizer implemented on Jetson Orin Nano hardware. It splits retrieval and summarization across Nano-R and Nano-S, respectively, enabling on-device processing with a socket-based data transfer and a locally hosted small language model. A two-part ED-focused output (Critical Findings and Context-Specific Summary) is paired with a Factual Accuracy (FA) evaluation framework using LLMs as judges to ensure reliability without gold references. Experimental results on MIMIC-IV and real EHRs show competitive FA, completeness, and clarity with end-to-end latency in the order of seconds to a minute, highlighting potential for field deployment in privacy-constrained, connectivity-challenged settings.

Abstract

Electronic health records (EHRs) contain extensive unstructured clinical data that can overwhelm emergency physicians trying to identify critical information. We present a two-stage summarization system that runs entirely on embedded devices, enabling offline clinical summarization while preserving patient privacy. In our approach, a dual-device architecture first retrieves relevant patient record sections using the Jetson Nano-R (Retrieve), then generates a structured summary on another Jetson Nano-S (Summarize), communicating via a lightweight socket link. The summarization output is two-fold: (1) a fixed-format list of critical findings, and (2) a context-specific narrative focused on the clinician's query. The retrieval stage uses locally stored EHRs, splits long notes into semantically coherent sections, and searches for the most relevant sections per query. The generation stage uses a locally hosted small language model (SLM) to produce the summary from the retrieved text, operating within the constraints of two NVIDIA Jetson devices. We first benchmarked six open-source SLMs under 7B parameters to identify viable models. We incorporated an LLM-as-Judge evaluation mechanism to assess summary quality in terms of factual accuracy, completeness, and clarity. Preliminary results on MIMIC-IV and de-identified real EHRs demonstrate that our fully offline system can effectively produce useful summaries in under 30 seconds.

Dual-stage and Lightweight Patient Chart Summarization for Emergency Physicians

TL;DR

The paper tackles the need for fast, privacy-preserving access to salient patient information in emergency departments by proposing a fully offline, dual-device EHR summarizer implemented on Jetson Orin Nano hardware. It splits retrieval and summarization across Nano-R and Nano-S, respectively, enabling on-device processing with a socket-based data transfer and a locally hosted small language model. A two-part ED-focused output (Critical Findings and Context-Specific Summary) is paired with a Factual Accuracy (FA) evaluation framework using LLMs as judges to ensure reliability without gold references. Experimental results on MIMIC-IV and real EHRs show competitive FA, completeness, and clarity with end-to-end latency in the order of seconds to a minute, highlighting potential for field deployment in privacy-constrained, connectivity-challenged settings.

Abstract

Electronic health records (EHRs) contain extensive unstructured clinical data that can overwhelm emergency physicians trying to identify critical information. We present a two-stage summarization system that runs entirely on embedded devices, enabling offline clinical summarization while preserving patient privacy. In our approach, a dual-device architecture first retrieves relevant patient record sections using the Jetson Nano-R (Retrieve), then generates a structured summary on another Jetson Nano-S (Summarize), communicating via a lightweight socket link. The summarization output is two-fold: (1) a fixed-format list of critical findings, and (2) a context-specific narrative focused on the clinician's query. The retrieval stage uses locally stored EHRs, splits long notes into semantically coherent sections, and searches for the most relevant sections per query. The generation stage uses a locally hosted small language model (SLM) to produce the summary from the retrieved text, operating within the constraints of two NVIDIA Jetson devices. We first benchmarked six open-source SLMs under 7B parameters to identify viable models. We incorporated an LLM-as-Judge evaluation mechanism to assess summary quality in terms of factual accuracy, completeness, and clarity. Preliminary results on MIMIC-IV and de-identified real EHRs demonstrate that our fully offline system can effectively produce useful summaries in under 30 seconds.

Paper Structure

This paper contains 32 sections, 5 equations, 3 figures, 3 tables.

Figures (3)

  • Figure 1: Dual-stage on-device architecture, enabling low-latency and privacy-preserving inference at the point of care.
  • Figure 2: Picture of our two Jetson Nanos connected locally, Nano-R (left) retrieves EHR context, while Nano-S (right) generates the two-mode summary (critical + context-specific) on-device.
  • Figure 3: Two-mode ED chart summarization: from EHR input and the chief complaint, the system produces (i) a Critical summary of the top must-know facts and (ii) a Context-based summary focused on the current complaint.