SLaVA-CXR: Small Language and Vision Assistant for Chest X-ray Report Automation

Jinge Wu; Yunsoo Kim; Daqian Shi; David Cliffton; Fenglin Liu; Honghan Wu

SLaVA-CXR: Small Language and Vision Assistant for Chest X-ray Report Automation

Jinge Wu, Yunsoo Kim, Daqian Shi, David Cliffton, Fenglin Liu, Honghan Wu

TL;DR

An open-source Small Language and Vision Assistant (SLaVA-CXR) that can be used for Chest X-Ray report automation and introduces a data synthesis method, RADEX, which can generate a high-quality and diverse training corpus with privacy regulation compliance.

Abstract

Inspired by the success of large language models (LLMs), there is growing research interest in developing LLMs in the medical domain to assist clinicians. However, for hospitals, using closed-source commercial LLMs involves privacy issues, and developing open-source public LLMs requires large-scale computational resources, which are usually limited, especially in resource-efficient regions and low-income countries. We propose an open-source Small Language and Vision Assistant (SLaVA-CXR) that can be used for Chest X-Ray report automation. To efficiently train a small assistant, we first propose the Re$^3$Training method, which simulates the cognitive development of radiologists and optimizes the model in the Recognition, Reasoning, and Reporting training manner. Then, we introduce a data synthesis method, RADEX, which can generate a high-quality and diverse training corpus with privacy regulation compliance. The extensive experiments show that our SLaVA-CXR built on a 2.7B backbone not only outperforms but also achieves 6 times faster inference efficiency than previous state-of-the-art larger models.

SLaVA-CXR: Small Language and Vision Assistant for Chest X-ray Report Automation

TL;DR

Abstract

Training method, which simulates the cognitive development of radiologists and optimizes the model in the Recognition, Reasoning, and Reporting training manner. Then, we introduce a data synthesis method, RADEX, which can generate a high-quality and diverse training corpus with privacy regulation compliance. The extensive experiments show that our SLaVA-CXR built on a 2.7B backbone not only outperforms but also achieves 6 times faster inference efficiency than previous state-of-the-art larger models.

Paper Structure (32 sections, 3 equations, 4 figures, 9 tables)

This paper contains 32 sections, 3 equations, 4 figures, 9 tables.

Introduction
Related Works
Re$^3$Training
Recognition
Reasoning
Reporting
RADiology EXpertise Corpus (RADEX)
Experiments
Experiment Setup
Evaluation Data.
Task Description.
Baseline Models.
Evaluation Metrics.
Automatic Evaluation
Generation Results.
...and 17 more sections

Figures (4)

Figure 1: The proposed Re$^3$Training pipeline. Stage 1: The Recognition stage aims to improve model capacity in aligning clinical concepts between two modalities; Stage 2: The Reasoning stage aims to capture CXR image nuances; Stage 3: The Reporting stage is for learning the clinical notes of CXRs.
Figure 2: An example of RADEX corpus.
Figure 3: Human evaluation of our method and LLaVA-Med on the correctness, completeness, and coherence.
Figure 4: Qualitative analysis of model outputs. Blue-colored text denotes alignment between the ground truth text and the generated text. Red-colored text denotes unfavorable results.

SLaVA-CXR: Small Language and Vision Assistant for Chest X-ray Report Automation

TL;DR

Abstract

SLaVA-CXR: Small Language and Vision Assistant for Chest X-ray Report Automation

Authors

TL;DR

Abstract

Table of Contents

Figures (4)