Contrastive Learning with Counterfactual Explanations for Radiology Report Generation
Mingjie Li, Haokun Lin, Liang Qiu, Xiaodan Liang, Ling Chen, Abdulmotaleb Elsaddik, Xiaojun Chang
TL;DR
CoFE addresses the data bias problem in radiology report generation caused by shared anatomy and lesions. It introduces counterfactual explanations to generate patch-swapped counterfactual images between similar but differently labeled samples, coupled with a learnable prompt to fine-tune a GPT-2 Medium LLM for report generation, trained with a joint objective including $L_{IRC}$, $L_{RG}$, and $L_{CF}$. The approach yields a non-spurious visual representation learning framework and superior descriptive and clinical efficacy metrics on IU-Xray and MIMIC-CXR. The work advances interpretable, clinically aligned report generation and demonstrates the practical impact of counterfactual reasoning in medical vision-language tasks.
Abstract
Due to the common content of anatomy, radiology images with their corresponding reports exhibit high similarity. Such inherent data bias can predispose automatic report generation models to learn entangled and spurious representations resulting in misdiagnostic reports. To tackle these, we propose a novel \textbf{Co}unter\textbf{F}actual \textbf{E}xplanations-based framework (CoFE) for radiology report generation. Counterfactual explanations serve as a potent tool for understanding how decisions made by algorithms can be changed by asking ``what if'' scenarios. By leveraging this concept, CoFE can learn non-spurious visual representations by contrasting the representations between factual and counterfactual images. Specifically, we derive counterfactual images by swapping a patch between positive and negative samples until a predicted diagnosis shift occurs. Here, positive and negative samples are the most semantically similar but have different diagnosis labels. Additionally, CoFE employs a learnable prompt to efficiently fine-tune the pre-trained large language model, encapsulating both factual and counterfactual content to provide a more generalizable prompt representation. Extensive experiments on two benchmarks demonstrate that leveraging the counterfactual explanations enables CoFE to generate semantically coherent and factually complete reports and outperform in terms of language generation and clinical efficacy metrics.
