Table of Contents
Fetching ...

Act Like a Radiologist: Radiology Report Generation across Anatomical Regions

Qi Chen, Yutong Xie, Biao Wu, Xiaomin Chen, James Ang, Minh-Son To, Xiaojun Chang, Qi Wu

TL;DR

This work tackles radiology report generation across multiple anatomical regions, addressing chest-centric data limitations and semantic drift in cross-dataset deployment. It introduces X-RGen, a radiologist-minded framework that proceeds through four phases—initial observation, cross-region analysis, medical interpretation, and report formation—coupled with a general radiological knowledge base and region-aware knowledge selection. A cross-region learning objective aligns image and report representations across body parts, while a Transformer-based knowledge aggregation and decoder generate medically informed reports; training integrates a captioning objective with a cross-region loss via $\\mathcal{L}_{cap}$ and $\\lambda \\mathcal{L}_{x}$. Evaluations on a merged six-region X-ray dataset (including IU-Xray for chest) show X-RGen outperforms specialised and generalist baselines on NLG metrics (BLEU, CIDEr, METEOR) and clinical measures (recall, F1), with evidence from qualitative examples, CLIPScore-based semantic alignment, and CheXpert-based recognition probes. The results underscore the impact of cross-region learning and medical-knowledge integration for robust, clinically relevant radiology report generation with broader applicability beyond chest imaging.

Abstract

Automating radiology report generation can ease the reporting workload for radiologists. However, existing works focus mainly on the chest area due to the limited availability of public datasets for other regions. Besides, they often rely on naive data-driven approaches, e.g., a basic encoder-decoder framework with captioning loss, which limits their ability to recognise complex patterns across diverse anatomical regions. To address these issues, we propose X-RGen, a radiologist-minded report generation framework across six anatomical regions. In X-RGen, we seek to mimic the behaviour of human radiologists, breaking them down into four principal phases: 1) initial observation, 2) cross-region analysis, 3) medical interpretation, and 4) report formation. Firstly, we adopt an image encoder for feature extraction, akin to a radiologist's preliminary review. Secondly, we enhance the recognition capacity of the image encoder by analysing images and reports across various regions, mimicking how radiologists gain their experience and improve their professional ability from past cases. Thirdly, just as radiologists apply their expertise to interpret radiology images, we introduce radiological knowledge of multiple anatomical regions to further analyse the features from a clinical perspective. Lastly, we generate reports based on the medical-aware features using a typical auto-regressive text decoder. Both natural language generation (NLG) and clinical efficacy metrics show the effectiveness of X-RGen on six X-ray datasets. Our code and checkpoints are available at: https://github.com/YtongXie/X-RGen.

Act Like a Radiologist: Radiology Report Generation across Anatomical Regions

TL;DR

This work tackles radiology report generation across multiple anatomical regions, addressing chest-centric data limitations and semantic drift in cross-dataset deployment. It introduces X-RGen, a radiologist-minded framework that proceeds through four phases—initial observation, cross-region analysis, medical interpretation, and report formation—coupled with a general radiological knowledge base and region-aware knowledge selection. A cross-region learning objective aligns image and report representations across body parts, while a Transformer-based knowledge aggregation and decoder generate medically informed reports; training integrates a captioning objective with a cross-region loss via and . Evaluations on a merged six-region X-ray dataset (including IU-Xray for chest) show X-RGen outperforms specialised and generalist baselines on NLG metrics (BLEU, CIDEr, METEOR) and clinical measures (recall, F1), with evidence from qualitative examples, CLIPScore-based semantic alignment, and CheXpert-based recognition probes. The results underscore the impact of cross-region learning and medical-knowledge integration for robust, clinically relevant radiology report generation with broader applicability beyond chest imaging.

Abstract

Automating radiology report generation can ease the reporting workload for radiologists. However, existing works focus mainly on the chest area due to the limited availability of public datasets for other regions. Besides, they often rely on naive data-driven approaches, e.g., a basic encoder-decoder framework with captioning loss, which limits their ability to recognise complex patterns across diverse anatomical regions. To address these issues, we propose X-RGen, a radiologist-minded report generation framework across six anatomical regions. In X-RGen, we seek to mimic the behaviour of human radiologists, breaking them down into four principal phases: 1) initial observation, 2) cross-region analysis, 3) medical interpretation, and 4) report formation. Firstly, we adopt an image encoder for feature extraction, akin to a radiologist's preliminary review. Secondly, we enhance the recognition capacity of the image encoder by analysing images and reports across various regions, mimicking how radiologists gain their experience and improve their professional ability from past cases. Thirdly, just as radiologists apply their expertise to interpret radiology images, we introduce radiological knowledge of multiple anatomical regions to further analyse the features from a clinical perspective. Lastly, we generate reports based on the medical-aware features using a typical auto-regressive text decoder. Both natural language generation (NLG) and clinical efficacy metrics show the effectiveness of X-RGen on six X-ray datasets. Our code and checkpoints are available at: https://github.com/YtongXie/X-RGen.
Paper Structure (40 sections, 8 equations, 5 figures, 11 tables, 1 algorithm)

This paper contains 40 sections, 8 equations, 5 figures, 11 tables, 1 algorithm.

Figures (5)

  • Figure 1: Reports written by radiologists vs. existing models (e.g., R2Gen chen2020generating trained on our merged dataset), and our X-RGen. We observe that R2Gen remembers some commonly used descriptions (highlighted in red) regardless of the semantic alignment with images, e.g., the correct diagnosis is "there is moderate cardiomegaly" (highlighted in green) while R2Gen keeps "the heart is normal in size" (highlighted in red).
  • Figure 2: (a) X-RGen mimics the behaviour of how human radiologists write reports. (b) We calculate CIDEr for both specialised and generalist models on different datasets.
  • Figure 3: Overall of X-RGen. We decompose the framework into four phases: 1) initial observation, 2) cross-region analysis, 3) medical interpretation, and 4) report formation. Specifically, starting with an image encoder to extract visual features, the model then enhances recognition by interacting with cross-region data. Next, it applies radiological knowledge for further medical-aware analysis, and finally, generates reports based on the enhanced and medical-aware features. Note that the second phase (i.e., cross-region analysis, green arrows) is only for training and will be removed in inference.
  • Figure 4: Reports generated by X-RGen (ours) and two baselines -- R2Gen and R2Gen$^\dagger$. R2Gen is trained on IU-Xray only while R2Gen$^\dagger$ optimised on our merged training set.
  • Figure 5: Examples on the private datasets. Each example contains a frontal image (first column) and another image (second column) with the corresponding radiology report.