Table of Contents
Fetching ...

MEPNet: Medical Entity-balanced Prompting Network for Brain CT Report Generation

Xiaodan Zhang, Yanzhao Shi, Junzhong Ji, Chengxin Zheng, Liangqiong Qu

TL;DR

MEPNet tackles biased learning of diverse medical entities in brain CT report generation by introducing entity-level visual embeddings and learning-status cues that are integrated into a multi-modal prompting framework for LLMs. It features a Knowledge-driven Joint Attention module that fuses explicit and implicit medical knowledge with scan features via cross attention and knowledge-masked self attention, and a Learning Status Scorer that produces status words used as prompts. The approach integrates entity embeddings, status embeddings, and scan embeddings into a structured multi-modal prompt, enabling the LLM to balance entity learning during generation. Experiments on BCT-CHR and CTRG-Brain show state-of-the-art results in both natural language and clinical evaluation metrics, demonstrating improved accuracy, completeness, and coherence in brain CT reports and illustrating potential applicability to other 3D medical reporting tasks.

Abstract

The automatic generation of brain CT reports has gained widespread attention, given its potential to assist radiologists in diagnosing cranial diseases. However, brain CT scans involve extensive medical entities, such as diverse anatomy regions and lesions, exhibiting highly inconsistent spatial patterns in 3D volumetric space. This leads to biased learning of medical entities in existing methods, resulting in repetitiveness and inaccuracy in generated reports. To this end, we propose a Medical Entity-balanced Prompting Network (MEPNet), which harnesses the large language model (LLM) to fairly interpret various entities for accurate brain CT report generation. By introducing the visual embedding and the learning status of medical entities as enriched clues, our method prompts the LLM to balance the learning of diverse entities, thereby enhancing reports with comprehensive findings. First, to extract visual embedding of entities, we propose Knowledge-driven Joint Attention to explore and distill entity patterns using both explicit and implicit medical knowledge. Then, a Learning Status Scorer is designed to evaluate the learning of entity visual embeddings, resulting in unique learning status for individual entities. Finally, these entity visual embeddings and status are elaborately integrated into multi-modal prompts, to guide the text generation of LLM. This process allows LLM to self-adapt the learning process for biased-fitted entities, thereby covering detailed findings in generated reports. We conduct experiments on two brain CT report generation benchmarks, showing the effectiveness in clinical accuracy and text coherence.

MEPNet: Medical Entity-balanced Prompting Network for Brain CT Report Generation

TL;DR

MEPNet tackles biased learning of diverse medical entities in brain CT report generation by introducing entity-level visual embeddings and learning-status cues that are integrated into a multi-modal prompting framework for LLMs. It features a Knowledge-driven Joint Attention module that fuses explicit and implicit medical knowledge with scan features via cross attention and knowledge-masked self attention, and a Learning Status Scorer that produces status words used as prompts. The approach integrates entity embeddings, status embeddings, and scan embeddings into a structured multi-modal prompt, enabling the LLM to balance entity learning during generation. Experiments on BCT-CHR and CTRG-Brain show state-of-the-art results in both natural language and clinical evaluation metrics, demonstrating improved accuracy, completeness, and coherence in brain CT reports and illustrating potential applicability to other 3D medical reporting tasks.

Abstract

The automatic generation of brain CT reports has gained widespread attention, given its potential to assist radiologists in diagnosing cranial diseases. However, brain CT scans involve extensive medical entities, such as diverse anatomy regions and lesions, exhibiting highly inconsistent spatial patterns in 3D volumetric space. This leads to biased learning of medical entities in existing methods, resulting in repetitiveness and inaccuracy in generated reports. To this end, we propose a Medical Entity-balanced Prompting Network (MEPNet), which harnesses the large language model (LLM) to fairly interpret various entities for accurate brain CT report generation. By introducing the visual embedding and the learning status of medical entities as enriched clues, our method prompts the LLM to balance the learning of diverse entities, thereby enhancing reports with comprehensive findings. First, to extract visual embedding of entities, we propose Knowledge-driven Joint Attention to explore and distill entity patterns using both explicit and implicit medical knowledge. Then, a Learning Status Scorer is designed to evaluate the learning of entity visual embeddings, resulting in unique learning status for individual entities. Finally, these entity visual embeddings and status are elaborately integrated into multi-modal prompts, to guide the text generation of LLM. This process allows LLM to self-adapt the learning process for biased-fitted entities, thereby covering detailed findings in generated reports. We conduct experiments on two brain CT report generation benchmarks, showing the effectiveness in clinical accuracy and text coherence.

Paper Structure

This paper contains 25 sections, 10 equations, 6 figures, 3 tables.

Figures (6)

  • Figure 1: (a) shows an example of biased entity learning, where different medical entities in brain CT scans exhibit distinct learning losses, indicating biased learning statuses. (b) compares F1 scores of different MRG models for covering various medical entities in generated reports. Our method shows the narrowest box with a higher median F1 score, indicating our effectiveness in balanced and accurate learning.
  • Figure 2: The framework of MEPNet, with two branches: Multi-scan Visual Prompting, which processes visual information of brain CT scans, and Entity-balanced Prompting, which mines entity visual embeddings and corresponding learning status for achieving balanced entity learning within LLM. These branches collaboratively prompt the LLM to generate diagnostic reports.
  • Figure 3: Template of the multi-modal prompt. Different colors are utilized to represent distinct components.
  • Figure 4: Details of the Knowledge-driven Joint Attention.
  • Figure 5: Matching rules of status words and status scores.
  • ...and 1 more figures