Table of Contents
Fetching ...

ChatRadio-Valuer: A Chat Large Language Model for Generalizable Radiology Report Generation Based on Multi-institution and Multi-system Data

Tianyang Zhong, Wei Zhao, Yutong Zhang, Yi Pan, Peixin Dong, Zuowei Jiang, Xiaoyan Kui, Youlan Shang, Li Yang, Yaonai Wei, Longtao Yang, Hao Chen, Huan Zhao, Yuxiao Liu, Ning Zhu, Yiwei Li, Yisong Wang, Jiaqi Yao, Jiaqi Wang, Ying Zeng, Lei He, Chao Zheng, Zhixue Zhang, Ming Li, Zhengliang Liu, Haixing Dai, Zihao Wu, Lu Zhang, Shu Zhang, Xiaoyan Cai, Xintao Hu, Shijie Zhao, Xi Jiang, Xin Zhang, Xiang Li, Dajiang Zhu, Lei Guo, Dinggang Shen, Junwei Han, Tianming Liu, Jun Liu, Tuo Zhang

TL;DR

This work tackles cross-institution and cross-system heterogeneity in radiology report generation by presenting ChatRadio-Valuer, a Llama2-based, fine-tuned model trained on a large, multi-institution, multi-system dataset. It introduces a five-phase framework combining data acquisition, preprocessing, abstract feature extraction, report generation, and expert evaluation, enhanced by prompt engineering and parameter-efficient fine-tuning (LoRA with 4-bit quantization). Across six institutions and five body systems, ChatRadio-Valuer demonstrates strong cross-institution and cross-system generalization, outperforming state-of-the-art LLMs in disease-impression generation and clinical utility while maintaining favorable deployment costs. Expert radiologist evaluations corroborate its practical value, suggesting potential to reduce annotation workload and standardize radiology impressions in real-world clinical settings.

Abstract

Radiology report generation, as a key step in medical image analysis, is critical to the quantitative analysis of clinically informed decision-making levels. However, complex and diverse radiology reports with cross-source heterogeneity pose a huge generalizability challenge to the current methods under massive data volume, mainly because the style and normativity of radiology reports are obviously distinctive among institutions, body regions inspected and radiologists. Recently, the advent of large language models (LLM) offers great potential for recognizing signs of health conditions. To resolve the above problem, we collaborate with the Second Xiangya Hospital in China and propose ChatRadio-Valuer based on the LLM, a tailored model for automatic radiology report generation that learns generalizable representations and provides a basis pattern for model adaptation in sophisticated analysts' cases. Specifically, ChatRadio-Valuer is trained based on the radiology reports from a single institution by means of supervised fine-tuning, and then adapted to disease diagnosis tasks for human multi-system evaluation (i.e., chest, abdomen, muscle-skeleton, head, and maxillofacial $\&$ neck) from six different institutions in clinical-level events. The clinical dataset utilized in this study encompasses a remarkable total of \textbf{332,673} observations. From the comprehensive results on engineering indicators, clinical efficacy and deployment cost metrics, it can be shown that ChatRadio-Valuer consistently outperforms state-of-the-art models, especially ChatGPT (GPT-3.5-Turbo) and GPT-4 et al., in terms of the diseases diagnosis from radiology reports. ChatRadio-Valuer provides an effective avenue to boost model generalization performance and alleviate the annotation workload of experts to enable the promotion of clinical AI applications in radiology reports.

ChatRadio-Valuer: A Chat Large Language Model for Generalizable Radiology Report Generation Based on Multi-institution and Multi-system Data

TL;DR

This work tackles cross-institution and cross-system heterogeneity in radiology report generation by presenting ChatRadio-Valuer, a Llama2-based, fine-tuned model trained on a large, multi-institution, multi-system dataset. It introduces a five-phase framework combining data acquisition, preprocessing, abstract feature extraction, report generation, and expert evaluation, enhanced by prompt engineering and parameter-efficient fine-tuning (LoRA with 4-bit quantization). Across six institutions and five body systems, ChatRadio-Valuer demonstrates strong cross-institution and cross-system generalization, outperforming state-of-the-art LLMs in disease-impression generation and clinical utility while maintaining favorable deployment costs. Expert radiologist evaluations corroborate its practical value, suggesting potential to reduce annotation workload and standardize radiology impressions in real-world clinical settings.

Abstract

Radiology report generation, as a key step in medical image analysis, is critical to the quantitative analysis of clinically informed decision-making levels. However, complex and diverse radiology reports with cross-source heterogeneity pose a huge generalizability challenge to the current methods under massive data volume, mainly because the style and normativity of radiology reports are obviously distinctive among institutions, body regions inspected and radiologists. Recently, the advent of large language models (LLM) offers great potential for recognizing signs of health conditions. To resolve the above problem, we collaborate with the Second Xiangya Hospital in China and propose ChatRadio-Valuer based on the LLM, a tailored model for automatic radiology report generation that learns generalizable representations and provides a basis pattern for model adaptation in sophisticated analysts' cases. Specifically, ChatRadio-Valuer is trained based on the radiology reports from a single institution by means of supervised fine-tuning, and then adapted to disease diagnosis tasks for human multi-system evaluation (i.e., chest, abdomen, muscle-skeleton, head, and maxillofacial neck) from six different institutions in clinical-level events. The clinical dataset utilized in this study encompasses a remarkable total of \textbf{332,673} observations. From the comprehensive results on engineering indicators, clinical efficacy and deployment cost metrics, it can be shown that ChatRadio-Valuer consistently outperforms state-of-the-art models, especially ChatGPT (GPT-3.5-Turbo) and GPT-4 et al., in terms of the diseases diagnosis from radiology reports. ChatRadio-Valuer provides an effective avenue to boost model generalization performance and alleviate the annotation workload of experts to enable the promotion of clinical AI applications in radiology reports.
Paper Structure (29 sections, 6 equations, 13 figures, 8 tables, 3 algorithms)

This paper contains 29 sections, 6 equations, 13 figures, 8 tables, 3 algorithms.

Figures (13)

  • Figure 1: Overall framework of the proposed method for radiology report generation. Multi-institution and multi-system clinical radiology reports are acquired in phase 1. Systematic data preprocessing is implemented and then synthesizes the samples into high-quality prompts in phase 2. The generalizable advanced features are extracted and applied for clinical utilities in phase 3 & phase 4. The comprehensive evaluations on ChatRadio-Valuer's efficacy are executed in phase 5.
  • Figure 2: A radiology report example on its attributes. These attributes are manually diagnosed and described by radiologists at different levels, among which there are obvious variations in styles.
  • Figure 3: The architecture diagram of Llama 2. The model structure of Llama 2 is basically consistent with the standard Transformer Decoder structure, mainly composed of 32 Transformer Blocks
  • Figure 4: Prompt generation overview. The overall framework contains three parts, system description, instruction, and input, which collaboratively constitute a prompt. Within a prompt example (purple), expert instruction and input data on its right are inserted to the {Expert Instruction} and {Input Data}, respectively. The derived impression is in the {Output Impression}.
  • Figure 5: LLM pool for model selection. Considering the scene of medical application, six aspects are considered: domain adaptability, compatibility with medical standards, bilingual, open source, parameter efficiency, and cost and licensing. 15 SOTA LLMs (12 baseline models, 3 fine-tuning pairs, and 1 fine-tuned ChatGLM2-6B) from 10 organizations jointly are established by the LLM pool.
  • ...and 8 more figures