Table of Contents
Fetching ...

Anatomically-Grounded Fact Checking of Automated Chest X-ray Reports

R. Mahmood, K. C. L. Wong, D. M. Reyes, N. D'Souza, L. Shi, J. Wu, P. Kaviani, M. Kalra, G. Wang, P. Yan, T. Syeda-Mahmood

TL;DR

The paper tackles hallucinations in automated chest X-ray report generation by proposing an anatomically-grounded fact-checking framework. It builds a synthetic image-FFL dataset and trains a multi-label cross-modal contrastive regression network to detect real versus fake findings and localize them to anatomical regions, enabling explainable error detection and LLM-assisted correction. Across multiple datasets and report generators, the method achieves strong real/fake and grounding performance and yields about a 40% improvement in corrected report quality. This approach offers a practical path toward safer, more reliable radiology report generation in clinical workflows, with potential applicability to broader medico-visual tasks.

Abstract

With the emergence of large-scale vision-language models, realistic radiology reports may be generated using only medical images as input guided by simple prompts. However, their practical utility has been limited due to the factual errors in their description of findings. In this paper, we propose a novel model for explainable fact-checking that identifies errors in findings and their locations indicated through the reports. Specifically, we analyze the types of errors made by automated reporting methods and derive a new synthetic dataset of images paired with real and fake descriptions of findings and their locations from a ground truth dataset. A new multi-label cross-modal contrastive regression network is then trained on this datsaset. We evaluate the resulting fact-checking model and its utility in correcting reports generated by several SOTA automated reporting tools on a variety of benchmark datasets with results pointing to over 40\% improvement in report quality through such error detection and correction.

Anatomically-Grounded Fact Checking of Automated Chest X-ray Reports

TL;DR

The paper tackles hallucinations in automated chest X-ray report generation by proposing an anatomically-grounded fact-checking framework. It builds a synthetic image-FFL dataset and trains a multi-label cross-modal contrastive regression network to detect real versus fake findings and localize them to anatomical regions, enabling explainable error detection and LLM-assisted correction. Across multiple datasets and report generators, the method achieves strong real/fake and grounding performance and yields about a 40% improvement in corrected report quality. This approach offers a practical path toward safer, more reliable radiology report generation in clinical workflows, with potential applicability to broader medico-visual tasks.

Abstract

With the emergence of large-scale vision-language models, realistic radiology reports may be generated using only medical images as input guided by simple prompts. However, their practical utility has been limited due to the factual errors in their description of findings. In this paper, we propose a novel model for explainable fact-checking that identifies errors in findings and their locations indicated through the reports. Specifically, we analyze the types of errors made by automated reporting methods and derive a new synthetic dataset of images paired with real and fake descriptions of findings and their locations from a ground truth dataset. A new multi-label cross-modal contrastive regression network is then trained on this datsaset. We evaluate the resulting fact-checking model and its utility in correcting reports generated by several SOTA automated reporting tools on a variety of benchmark datasets with results pointing to over 40\% improvement in report quality through such error detection and correction.

Paper Structure

This paper contains 7 sections, 6 equations, 6 figures, 6 tables.

Figures (6)

  • Figure 1: Illustration of errors in radiology reporting. (a) Ground truth report. (b) Generated report by XrayGPTxraygpt. (c) Corrected report by our method. The sentence with error in finding is colored orange in (b) and corrected sentence is shown in green in (c).
  • Figure 2: Illustration of the a fact-checking system for clinical workflows. An automatically generated report is evaluated by the fact checking (FC) model and an explanation generated documenting the finding errors and their localization issues. A report corrector LLM then uses the fact-checking results and the original report to produce a corrected report.
  • Figure 3: Illustration of fact-checking on automatically generated reports. 5 cases are shown including a case of no error flagged as real finding by our FC model. For cases on absence finding (e.g. case 4), the predicted and ground truth location is at <0,0> coordinate as explained in text. The predicted finding location is in Green, while the ground truth location in red and the indicated location from automated report in yellow/orange.
  • Figure 4: Illustration of the FC model training using real and synthetic FFL patterns drawn from ground truth reports.
  • Figure 5: Illustration of the architecture of our FC model consisting of a contrastive encoder and regression network. The real samples are taken as positive and the fake labels as negative in the contrastive formulation. The loss functions for the encoder and regressor are also shown in the figure.
  • ...and 1 more figures