RaDialog: A Large Vision-Language Model for Radiology Report Generation and Conversational Assistance

Chantal Pellegrini; Ege Özsoy; Benjamin Busam; Nassir Navab; Matthias Keicher

RaDialog: A Large Vision-Language Model for Radiology Report Generation and Conversational Assistance

Chantal Pellegrini, Ege Özsoy, Benjamin Busam, Nassir Navab, Matthias Keicher

TL;DR

RaDialog tackles the challenge of clinically correct radiology report generation combined with interactive dialog by introducing a dual-branch LVLM that fuses image features and explicit structured findings with a tuned LLM via LoRA. A semi-automatic image-grounded instruct dataset (~580k samples across ten tasks) enables domain-specific dialog capabilities while mitigating catastrophic forgetting through replay and context dropping. The model demonstrates state-of-the-art clinical correctness in report generation and strong performance across interactive tasks such as report correction and findings QA, with radiologists preferring RaDialog over baselines. These results suggest RaDialog as a viable foundation for clinical radiology dialog systems, offering faster inference, robust multi-task capabilities, and a public dataset to spur further research and adoption.

Abstract

Conversational AI tools that can generate and discuss clinically correct radiology reports for a given medical image have the potential to transform radiology. Such a human-in-the-loop radiology assistant could facilitate a collaborative diagnostic process, thus saving time and improving the quality of reports. Towards this goal, we introduce RaDialog, the first thoroughly evaluated and publicly available large vision-language model for radiology report generation and interactive dialog. RaDialog effectively integrates visual image features and structured pathology findings with a large language model (LLM) while simultaneously adapting it to a specialized domain using parameter-efficient fine-tuning. To keep the conversational abilities of the underlying LLM, we propose a comprehensive, semi-automatically labeled, image-grounded instruct dataset for chest X-ray radiology tasks. By training with this dataset, our method achieves state-of-the-art clinical correctness in report generation and shows impressive abilities in interactive tasks such as correcting reports and answering questions, serving as a foundational step toward clinical dialog systems. Our code is available on github: https://github.com/ChantalMP/RaDialog.

RaDialog: A Large Vision-Language Model for Radiology Report Generation and Conversational Assistance

TL;DR

Abstract

Paper Structure (20 sections, 1 equation, 5 figures, 8 tables)

This paper contains 20 sections, 1 equation, 5 figures, 8 tables.

Introduction
Methodology
Model and Training
Instruct Dataset
Experimental Setup
Results and Discussion
Radiology report generation
Ablation of Architectural Components
Interactive Downstream Tasks
Conclusion
Additional report generation results
Details on LVLM Configurations
Instruct Dataset Details
Task Descriptions
Instruction Prompts
...and 5 more sections

Figures (5)

Figure 1: Pipeline overview: The Image Encoder extracts X-ray features and transforms them via adapter module a or b. The Structured Findings Extractor extracts high-level findings. Both outputs are integrated during Prompt Construction with conversation history and task-specific instructions to query the LLM. The predicted answer are added to the conversation history.
Figure 2: Qualitative report generation results of RaDialogproject (top) and RaDialogalign (bottom). Colors indicate matching findings in ground truth and prediction.
Figure 3: Qualitative conversation examples with RaDialogproject-ins (left) and RaDialogalign-ins (right), showing examples of correction, knowledge QA (zero-shot), easy language, and translation (zero-shot).
Figure 4: Qualitative report generation comparison of RaDialog with XrayGPT and GPT4-Vision.
Figure 5: Differences in conversation behavior of RaDialog-align-instruct and RaDialog-project-instruct in zero-shot conversational tasks.

RaDialog: A Large Vision-Language Model for Radiology Report Generation and Conversational Assistance

TL;DR

Abstract

RaDialog: A Large Vision-Language Model for Radiology Report Generation and Conversational Assistance

Authors

TL;DR

Abstract

Table of Contents

Figures (5)