Table of Contents
Fetching ...

CRRG-CLIP: Automatic Generation of Chest Radiology Reports and Classification of Chest Radiographs

Jianfei Xu, Thanet Markchom, Huizhi Liang

TL;DR

CRRG-CLIP introduces an end-to-end Chest Radiology Report Generation and Radiograph Classification framework that integrates region-focused report generation with multimodal image–text classification. The RRG module detects and selects informative anatomical regions and generates per-region sentences via GPT-2, while the R-CLIP module aligns image and report embeddings with self-supervised contrastive losses to enable robust classification from radiographs and generated reports. Experiments on multi-source chest radiography data demonstrate competitive report quality against high-performance baselines and strong classification performance, even with limited labeled data, and show the generated reports can support downstream diagnostic tasks. This approach advances interpretability, data efficiency, and practical utility in radiology by combining localized visual reasoning with multimodal learning to assist report writing and disease detection in chest radiographs.

Abstract

The complexity of stacked imaging and the massive number of radiographs make writing radiology reports complex and inefficient. Even highly experienced radiologists struggle to maintain accuracy and consistency in interpreting radiographs under prolonged high-intensity work. To address these issues, this work proposes the CRRG-CLIP Model (Chest Radiology Report Generation and Radiograph Classification Model), an end-to-end model for automated report generation and radiograph classification. The model consists of two modules: the radiology report generation module and the radiograph classification module. The generation module uses Faster R-CNN to identify anatomical regions in radiographs, a binary classifier to select key regions, and GPT-2 to generate semantically coherent reports. The classification module uses the unsupervised Contrastive Language Image Pretraining (CLIP) model, addressing the challenges of high-cost labelled datasets and insufficient features. The results show that the generation module performs comparably to high-performance baseline models on BLEU, METEOR, and ROUGE-L metrics, and outperformed the GPT-4o model on BLEU-2, BLEU-3, BLEU-4, and ROUGE-L metrics. The classification module significantly surpasses the state-of-the-art model in AUC and Accuracy. This demonstrates that the proposed model achieves high accuracy, readability, and fluency in report generation, while multimodal contrastive training with unlabelled radiograph-report pairs enhances classification performance.

CRRG-CLIP: Automatic Generation of Chest Radiology Reports and Classification of Chest Radiographs

TL;DR

CRRG-CLIP introduces an end-to-end Chest Radiology Report Generation and Radiograph Classification framework that integrates region-focused report generation with multimodal image–text classification. The RRG module detects and selects informative anatomical regions and generates per-region sentences via GPT-2, while the R-CLIP module aligns image and report embeddings with self-supervised contrastive losses to enable robust classification from radiographs and generated reports. Experiments on multi-source chest radiography data demonstrate competitive report quality against high-performance baselines and strong classification performance, even with limited labeled data, and show the generated reports can support downstream diagnostic tasks. This approach advances interpretability, data efficiency, and practical utility in radiology by combining localized visual reasoning with multimodal learning to assist report writing and disease detection in chest radiographs.

Abstract

The complexity of stacked imaging and the massive number of radiographs make writing radiology reports complex and inefficient. Even highly experienced radiologists struggle to maintain accuracy and consistency in interpreting radiographs under prolonged high-intensity work. To address these issues, this work proposes the CRRG-CLIP Model (Chest Radiology Report Generation and Radiograph Classification Model), an end-to-end model for automated report generation and radiograph classification. The model consists of two modules: the radiology report generation module and the radiograph classification module. The generation module uses Faster R-CNN to identify anatomical regions in radiographs, a binary classifier to select key regions, and GPT-2 to generate semantically coherent reports. The classification module uses the unsupervised Contrastive Language Image Pretraining (CLIP) model, addressing the challenges of high-cost labelled datasets and insufficient features. The results show that the generation module performs comparably to high-performance baseline models on BLEU, METEOR, and ROUGE-L metrics, and outperformed the GPT-4o model on BLEU-2, BLEU-3, BLEU-4, and ROUGE-L metrics. The classification module significantly surpasses the state-of-the-art model in AUC and Accuracy. This demonstrates that the proposed model achieves high accuracy, readability, and fluency in report generation, while multimodal contrastive training with unlabelled radiograph-report pairs enhances classification performance.
Paper Structure (27 sections, 4 figures, 4 tables)

This paper contains 27 sections, 4 figures, 4 tables.

Figures (4)

  • Figure 1: CRRG-CLIP Model Architecture.
  • Figure 2: Radiology Report Generation Module Architecture.
  • Figure 3: Radiograph Classification Module Architecture.
  • Figure 4: Example of a Result Generated by the Radiology Report Generation Module.