Table of Contents
Fetching ...

Improving Expert Radiology Report Summarization by Prompting Large Language Models with a Layperson Summary

Xingmeng Zhao, Tongnian Wang, Anthony Rios

TL;DR

The paper addresses the challenge of radiology report summarization by introducing a LaypersonPrompt prompting strategy that first generates layperson-friendly summaries to normalize observations before deriving expert impressions. It combines this intermediate step with few-shot in-context learning and multimodal demonstration retrieval to improve performance on multimodal radiology datasets (MIMIC-CXR, CheXpert, MIMIC-III) using open-source 7B–8B LLMs. Key contributions include the layperson normalization technique, a three-component prompting framework, and empirical gains in both in-domain and out-of-domain settings, along with an analysis of how impression length affects evaluation metrics. The approach reduces reliance on costly fine-tuning and enhances accessibility without sacrificing accuracy, offering a practical pathway for deploying non-expert LLMs in specialized medical summarization tasks.

Abstract

Radiology report summarization (RRS) is crucial for patient care, requiring concise "Impressions" from detailed "Findings." This paper introduces a novel prompting strategy to enhance RRS by first generating a layperson summary. This approach normalizes key observations and simplifies complex information using non-expert communication techniques inspired by doctor-patient interactions. Combined with few-shot in-context learning, this method improves the model's ability to link general terms to specific findings. We evaluate this approach on the MIMIC-CXR, CheXpert, and MIMIC-III datasets, benchmarking it against 7B/8B parameter state-of-the-art open-source large language models (LLMs) like Meta-Llama-3-8B-Instruct. Our results demonstrate improvements in summarization accuracy and accessibility, particularly in out-of-domain tests, with improvements as high as 5% for some metrics.

Improving Expert Radiology Report Summarization by Prompting Large Language Models with a Layperson Summary

TL;DR

The paper addresses the challenge of radiology report summarization by introducing a LaypersonPrompt prompting strategy that first generates layperson-friendly summaries to normalize observations before deriving expert impressions. It combines this intermediate step with few-shot in-context learning and multimodal demonstration retrieval to improve performance on multimodal radiology datasets (MIMIC-CXR, CheXpert, MIMIC-III) using open-source 7B–8B LLMs. Key contributions include the layperson normalization technique, a three-component prompting framework, and empirical gains in both in-domain and out-of-domain settings, along with an analysis of how impression length affects evaluation metrics. The approach reduces reliance on costly fine-tuning and enhances accessibility without sacrificing accuracy, offering a practical pathway for deploying non-expert LLMs in specialized medical summarization tasks.

Abstract

Radiology report summarization (RRS) is crucial for patient care, requiring concise "Impressions" from detailed "Findings." This paper introduces a novel prompting strategy to enhance RRS by first generating a layperson summary. This approach normalizes key observations and simplifies complex information using non-expert communication techniques inspired by doctor-patient interactions. Combined with few-shot in-context learning, this method improves the model's ability to link general terms to specific findings. We evaluate this approach on the MIMIC-CXR, CheXpert, and MIMIC-III datasets, benchmarking it against 7B/8B parameter state-of-the-art open-source large language models (LLMs) like Meta-Llama-3-8B-Instruct. Our results demonstrate improvements in summarization accuracy and accessibility, particularly in out-of-domain tests, with improvements as high as 5% for some metrics.
Paper Structure (10 sections, 7 figures, 4 tables)

This paper contains 10 sections, 7 figures, 4 tables.

Figures (7)

  • Figure 1: Overview of the LaypersonPrompt Framework. First, we generate layperson summaries from the training corpus using LLMs prompting. Then, for a test input, we use multimodal retrieval to find relevant examples. Finally, we incorporate these layperson summaries into the prompt, applying patient-doctor communication techniques to improve the model's reasoning.
  • Figure 2: Step 1: Layperson Summarization of the Training Dataset. An illustration of the layperson summary prompt used to generate layperson summaries for training examples. Disease observations are highlighted in different colors. The illustration shows a single example, with Instruction and Response sections repeated multiple times using few-shot in-context examples.
  • Figure 3: Step 3: Final Expert Summary Prompt Construction. Example of LaypersonPrompt. This is the final prompt after finding in-context examples to generate the final expert summary (i.e., the Impression section).
  • Figure 4: Error Analysis on MIMIC-CXR Test Dataset: Performance Comparison of OpenChat-3.5-7B Model across Different Impression Lengths.
  • Figure 5: Validation results vs. the number of in-context examples across various prompt types and modality embeddings on OpenChat-3.5-7B.
  • ...and 2 more figures