Table of Contents
Fetching ...

Rethinking Radiology Report Generation via Causal Inspired Counterfactual Augmentation

Xiao Song, Jiafan Liu, Yun Li, Yan Liu, Wenbin Lei, Ruxin Wang

TL;DR

This work reframes Radiology Report Generation as a causal inference problem, revealing that disease co-occurrence in biased data acts as confounders via backdoor paths $C_v$ and $C_s$. It introduces a model-agnostic counterfactual augmentation framework with Prototype-based Counterfactual Sample Synthesis ($P$-CSS) and Magic-Cube-like Counterfactual Report Reconstruction (Cube) to break visual and sequential spurious correlations. Empirical results on MIMIC-CXR and IU X-Ray show consistent gains in Clinical Efficacy and competitive gains in natural language metrics, with the ensemble of both strategies delivering the strongest improvements and better generalization. The approach enhances the trustworthiness of RRG by promoting causal discrimination between diseases and robust sentence generation, offering a practical, plug-and-play tool for future radiology AI systems.

Abstract

Radiology Report Generation (RRG) draws attention as a vision-and-language interaction of biomedical fields. Previous works inherited the ideology of traditional language generation tasks, aiming to generate paragraphs with high readability as reports. Despite significant progress, the independence between diseases-a specific property of RRG-was neglected, yielding the models being confused by the co-occurrence of diseases brought on by the biased data distribution, thus generating inaccurate reports. In this paper, to rethink this issue, we first model the causal effects between the variables from a causal perspective, through which we prove that the co-occurrence relationships between diseases on the biased distribution function as confounders, confusing the accuracy through two backdoor paths, i.e. the Joint Vision Coupling and the Conditional Sequential Coupling. Then, we proposed a novel model-agnostic counterfactual augmentation method that contains two strategies, i.e. the Prototype-based Counterfactual Sample Synthesis (P-CSS) and the Magic-Cube-like Counterfactual Report Reconstruction (Cube), to intervene the backdoor paths, thus enhancing the accuracy and generalization of RRG models. Experimental results on the widely used MIMIC-CXR dataset demonstrate the effectiveness of our proposed method. Additionally, a generalization performance is evaluated on IU X-Ray dataset, which verifies our work can effectively reduce the impact of co-occurrences caused by different distributions on the results.

Rethinking Radiology Report Generation via Causal Inspired Counterfactual Augmentation

TL;DR

This work reframes Radiology Report Generation as a causal inference problem, revealing that disease co-occurrence in biased data acts as confounders via backdoor paths and . It introduces a model-agnostic counterfactual augmentation framework with Prototype-based Counterfactual Sample Synthesis (-CSS) and Magic-Cube-like Counterfactual Report Reconstruction (Cube) to break visual and sequential spurious correlations. Empirical results on MIMIC-CXR and IU X-Ray show consistent gains in Clinical Efficacy and competitive gains in natural language metrics, with the ensemble of both strategies delivering the strongest improvements and better generalization. The approach enhances the trustworthiness of RRG by promoting causal discrimination between diseases and robust sentence generation, offering a practical, plug-and-play tool for future radiology AI systems.

Abstract

Radiology Report Generation (RRG) draws attention as a vision-and-language interaction of biomedical fields. Previous works inherited the ideology of traditional language generation tasks, aiming to generate paragraphs with high readability as reports. Despite significant progress, the independence between diseases-a specific property of RRG-was neglected, yielding the models being confused by the co-occurrence of diseases brought on by the biased data distribution, thus generating inaccurate reports. In this paper, to rethink this issue, we first model the causal effects between the variables from a causal perspective, through which we prove that the co-occurrence relationships between diseases on the biased distribution function as confounders, confusing the accuracy through two backdoor paths, i.e. the Joint Vision Coupling and the Conditional Sequential Coupling. Then, we proposed a novel model-agnostic counterfactual augmentation method that contains two strategies, i.e. the Prototype-based Counterfactual Sample Synthesis (P-CSS) and the Magic-Cube-like Counterfactual Report Reconstruction (Cube), to intervene the backdoor paths, thus enhancing the accuracy and generalization of RRG models. Experimental results on the widely used MIMIC-CXR dataset demonstrate the effectiveness of our proposed method. Additionally, a generalization performance is evaluated on IU X-Ray dataset, which verifies our work can effectively reduce the impact of co-occurrences caused by different distributions on the results.
Paper Structure (29 sections, 12 equations, 4 figures, 3 tables, 1 algorithm)

This paper contains 29 sections, 12 equations, 4 figures, 3 tables, 1 algorithm.

Figures (4)

  • Figure 1: (a) is the SCM graph under the Joint Vision Coupling $C_v$ and Conditional Sequential Coupling $C_s$. (b) and (c) are two examples with respect to the joint vision coupling and conditional sequential coupling.
  • Figure 2: Overview of our proposed counterfactual augmentation method, which contains (a) the Prototype-based Counterfactual Sample Synthesis (P-CSS) and (b) the Magic-Cube-like Counterfactual Report Reconstruction(Cube). (a) first prepares the Class-wise Visual Prototype Matrix $PM$ and the Sentence-level Class Labeling $Y_{y:l}$, then conducts counterfactual cross-modal intervention on the training set. (b) randomly disturbs the sequence of sentences in the report.
  • Figure 3: Hyper-parameter settings for P-CSS and Cube. (a) is the comparison of different ratios of P-CSS in the training set on CE (F1-score) metric. (b) is the comparison of different thresholds to establish the class-wise prototype matrix on CE metric. (c) is comparison of different ratios of Cube in the training set on the NLG (BLEU-4) and CE metrics. The red colored lines are the performance on intervened training set, and the grey colored lines are on the original set.
  • Figure 4: Visualization of the comparison of ground truth reports, the reports generated by the baseline model and the model applying our method on MIMIC-CXR dataset. The serial numbers with various colors are sentences describing the abnormal findings. The underlined sentences contain information that does not present in the radiographs.