Table of Contents
Fetching ...

Enhancing LLMs for Impression Generation in Radiology Reports through a Multi-Agent System

Fang Zeng, Zhiliang Lyu, Quanzheng Li, Xiang Li

TL;DR

RadCouncil presents a three-agent, retrieval-grounded framework to generate radiology report impressions from findings. By integrating a Report Retriever, a Radiologist, and a Reviewer, the system grounds impressions in exemplar reports and cross-checks for consistency. Evaluations using BLEU, ROUGE, and BERTScore, plus GPT-4o qualitative scoring on chest X-ray data, show RadCouncil outperforming a single-agent baseline in diagnostic quality and stylistic alignment, though some inconsistencies arise due to expanded context from retrieved exemplars. The Reviewer component helps mitigate errors, highlighting the potential and remaining challenges of multi-agent medical AI systems for reliable documentation.

Abstract

This study introduces "RadCouncil," a multi-agent Large Language Model (LLM) framework designed to enhance the generation of impressions in radiology reports from the finding section. RadCouncil comprises three specialized agents: 1) a "Retrieval" Agent that identifies and retrieves similar reports from a vector database, 2) a "Radiologist" Agent that generates impressions based on the finding section of the given report plus the exemplar reports retrieved by the Retrieval Agent, and 3) a "Reviewer" Agent that evaluates the generated impressions and provides feedback. The performance of RadCouncil was evaluated using both quantitative metrics (BLEU, ROUGE, BERTScore) and qualitative criteria assessed by GPT-4, using chest X-ray as a case study. Experiment results show improvements in RadCouncil over the single-agent approach across multiple dimensions, including diagnostic accuracy, stylistic concordance, and clarity. This study highlights the potential of utilizing multiple interacting LLM agents, each with a dedicated task, to enhance performance in specialized medical tasks and the development of more robust and adaptable healthcare AI solutions.

Enhancing LLMs for Impression Generation in Radiology Reports through a Multi-Agent System

TL;DR

RadCouncil presents a three-agent, retrieval-grounded framework to generate radiology report impressions from findings. By integrating a Report Retriever, a Radiologist, and a Reviewer, the system grounds impressions in exemplar reports and cross-checks for consistency. Evaluations using BLEU, ROUGE, and BERTScore, plus GPT-4o qualitative scoring on chest X-ray data, show RadCouncil outperforming a single-agent baseline in diagnostic quality and stylistic alignment, though some inconsistencies arise due to expanded context from retrieved exemplars. The Reviewer component helps mitigate errors, highlighting the potential and remaining challenges of multi-agent medical AI systems for reliable documentation.

Abstract

This study introduces "RadCouncil," a multi-agent Large Language Model (LLM) framework designed to enhance the generation of impressions in radiology reports from the finding section. RadCouncil comprises three specialized agents: 1) a "Retrieval" Agent that identifies and retrieves similar reports from a vector database, 2) a "Radiologist" Agent that generates impressions based on the finding section of the given report plus the exemplar reports retrieved by the Retrieval Agent, and 3) a "Reviewer" Agent that evaluates the generated impressions and provides feedback. The performance of RadCouncil was evaluated using both quantitative metrics (BLEU, ROUGE, BERTScore) and qualitative criteria assessed by GPT-4, using chest X-ray as a case study. Experiment results show improvements in RadCouncil over the single-agent approach across multiple dimensions, including diagnostic accuracy, stylistic concordance, and clarity. This study highlights the potential of utilizing multiple interacting LLM agents, each with a dedicated task, to enhance performance in specialized medical tasks and the development of more robust and adaptable healthcare AI solutions.

Paper Structure

This paper contains 15 sections, 2 figures, 4 tables.

Figures (2)

  • Figure 1: Diagram of the proposed RadCouncil framework, illustrating the interactions between the three agents, the report database, and the user.
  • Figure 2: Example comparison of original impression generated by radiologist agent only vs. impression enhanced by report retriever agent.