Table of Contents
Fetching ...

SPEER: Sentence-Level Planning of Long Clinical Summaries via Embedded Entity Retrieval

Griffin Adams, Jason Zucker, Noémie Elhadad

TL;DR

This work tackles the burden of writing long hospital-course summaries by introducing SPEER, a sentence-level planning approach that retrieves embedded salient medical entities before generating each sentence. The authors first build Entity Synonym Groups (ESGs) from clinical notes and train an ESG salience classifier to ground summarization on a subset of salient concepts, then guide open-source LLMs (Mistral-7B-Instruct and Zephyr-7B-beta) using SPEER's Retrieve-Realize-Repeat framework. Compared with non-guided and prompt-guided baselines, SPEER improves both the coverage of salient entities and faithfulness to source content across three diverse test sets, including unseen EHRs like MIMIC. The results support treating content selection as a separate predictive task and demonstrate that explicit planning with embedded entity retrieval yields more grounded, clinically useful summaries, potentially reducing documentation burden while maintaining safety. The approach is validated on a large CUIMC dataset (~$167{,}000$ admissions) and shows robustness to domain shifts, highlighting its practical relevance for long-form clinical summarization.

Abstract

Clinician must write a lengthy summary each time a patient is discharged from the hospital. This task is time-consuming due to the sheer number of unique clinical concepts covered in the admission. Identifying and covering salient entities is vital for the summary to be clinically useful. We fine-tune open-source LLMs (Mistral-7B-Instruct and Zephyr-7B-beta) on the task and find that they generate incomplete and unfaithful summaries. To increase entity coverage, we train a smaller, encoder-only model to predict salient entities, which are treated as content-plans to guide the LLM. To encourage the LLM to focus on specific mentions in the source notes, we propose SPEER: Sentence-level Planning via Embedded Entity Retrieval. Specifically, we mark each salient entity span with special "{ }" boundary tags and instruct the LLM to retrieve marked spans before generating each sentence. Sentence-level planning acts as a form of state tracking in that the model is explicitly recording the entities it uses. We fine-tune Mistral and Zephyr variants on a large-scale, diverse dataset of ~167k in-patient hospital admissions and evaluate on 3 datasets. SPEER shows gains in both coverage and faithfulness metrics over non-guided and guided baselines.

SPEER: Sentence-Level Planning of Long Clinical Summaries via Embedded Entity Retrieval

TL;DR

This work tackles the burden of writing long hospital-course summaries by introducing SPEER, a sentence-level planning approach that retrieves embedded salient medical entities before generating each sentence. The authors first build Entity Synonym Groups (ESGs) from clinical notes and train an ESG salience classifier to ground summarization on a subset of salient concepts, then guide open-source LLMs (Mistral-7B-Instruct and Zephyr-7B-beta) using SPEER's Retrieve-Realize-Repeat framework. Compared with non-guided and prompt-guided baselines, SPEER improves both the coverage of salient entities and faithfulness to source content across three diverse test sets, including unseen EHRs like MIMIC. The results support treating content selection as a separate predictive task and demonstrate that explicit planning with embedded entity retrieval yields more grounded, clinically useful summaries, potentially reducing documentation burden while maintaining safety. The approach is validated on a large CUIMC dataset (~ admissions) and shows robustness to domain shifts, highlighting its practical relevance for long-form clinical summarization.

Abstract

Clinician must write a lengthy summary each time a patient is discharged from the hospital. This task is time-consuming due to the sheer number of unique clinical concepts covered in the admission. Identifying and covering salient entities is vital for the summary to be clinically useful. We fine-tune open-source LLMs (Mistral-7B-Instruct and Zephyr-7B-beta) on the task and find that they generate incomplete and unfaithful summaries. To increase entity coverage, we train a smaller, encoder-only model to predict salient entities, which are treated as content-plans to guide the LLM. To encourage the LLM to focus on specific mentions in the source notes, we propose SPEER: Sentence-level Planning via Embedded Entity Retrieval. Specifically, we mark each salient entity span with special "{ }" boundary tags and instruct the LLM to retrieve marked spans before generating each sentence. Sentence-level planning acts as a form of state tracking in that the model is explicitly recording the entities it uses. We fine-tune Mistral and Zephyr variants on a large-scale, diverse dataset of ~167k in-patient hospital admissions and evaluate on 3 datasets. SPEER shows gains in both coverage and faithfulness metrics over non-guided and guided baselines.
Paper Structure (31 sections, 2 equations, 7 figures, 5 tables)

This paper contains 31 sections, 2 equations, 7 figures, 5 tables.

Figures (7)

  • Figure 1: Extracting entities and forming groups of synonymous entities (ESGs). For each admission, we form a set of ESGs from the source notes and content selection is performed by classifying each ESG as salient or not.
  • Figure 2: SPEER: Sentence-Level Planning via Embedded Entity Retrieval. The entire process of generating a hospital-course summary from a concatenated set of clinical notes is shown above. The first two steps relate to the formation and classification of Entity Synonym Groups (ESGs) from § \ref{['fig:esg-formation']}. The next two steps visually describe the SPEER approach in § \ref{['sec:guided']}. First, salient entity mentions are marked with special {{ }} boundary tags, which indicate that they are allowed to be retrieved during generation. Then, during generation, each summary sentence is generated on its own line. Above each sentence line, the model is instructed to first retrieve the entities it plans to use in the following sentence simply by generating entities within the {{ }} tags. This single-pass decoding can be explained with the acronym $\bm R^3$: Retrieve-Realize-Repeat, because each sentence is a realization of a plan.
  • Figure 3: Pseudo-code to generate training data for SPEER, which includes a step for learning to select salient entities (§ \ref{['sec:esg']}), followed by entity plan-guided summarization (§ \ref{['sec:guided']}).
  • Figure 4: Comparing the entity-level performance (source-guided recall (SGR) and source-guided precision (SGP)) of explicit content selection (classifying entities with a LongFormer Encoder) versus implicit (auto-regressive decoding) on CUIMC:2010-2014.
  • Figure 5: Validation Loss for Mistral-7B-v0.1 and zephyr-7b-beta as a function of training steps across 1 epoch (covering 167k hospital admissions).
  • ...and 2 more figures