Table of Contents
Fetching ...

Contrastive Learning with Counterfactual Explanations for Radiology Report Generation

Mingjie Li, Haokun Lin, Liang Qiu, Xiaodan Liang, Ling Chen, Abdulmotaleb Elsaddik, Xiaojun Chang

TL;DR

CoFE addresses the data bias problem in radiology report generation caused by shared anatomy and lesions. It introduces counterfactual explanations to generate patch-swapped counterfactual images between similar but differently labeled samples, coupled with a learnable prompt to fine-tune a GPT-2 Medium LLM for report generation, trained with a joint objective including $L_{IRC}$, $L_{RG}$, and $L_{CF}$. The approach yields a non-spurious visual representation learning framework and superior descriptive and clinical efficacy metrics on IU-Xray and MIMIC-CXR. The work advances interpretable, clinically aligned report generation and demonstrates the practical impact of counterfactual reasoning in medical vision-language tasks.

Abstract

Due to the common content of anatomy, radiology images with their corresponding reports exhibit high similarity. Such inherent data bias can predispose automatic report generation models to learn entangled and spurious representations resulting in misdiagnostic reports. To tackle these, we propose a novel \textbf{Co}unter\textbf{F}actual \textbf{E}xplanations-based framework (CoFE) for radiology report generation. Counterfactual explanations serve as a potent tool for understanding how decisions made by algorithms can be changed by asking ``what if'' scenarios. By leveraging this concept, CoFE can learn non-spurious visual representations by contrasting the representations between factual and counterfactual images. Specifically, we derive counterfactual images by swapping a patch between positive and negative samples until a predicted diagnosis shift occurs. Here, positive and negative samples are the most semantically similar but have different diagnosis labels. Additionally, CoFE employs a learnable prompt to efficiently fine-tune the pre-trained large language model, encapsulating both factual and counterfactual content to provide a more generalizable prompt representation. Extensive experiments on two benchmarks demonstrate that leveraging the counterfactual explanations enables CoFE to generate semantically coherent and factually complete reports and outperform in terms of language generation and clinical efficacy metrics.

Contrastive Learning with Counterfactual Explanations for Radiology Report Generation

TL;DR

CoFE addresses the data bias problem in radiology report generation caused by shared anatomy and lesions. It introduces counterfactual explanations to generate patch-swapped counterfactual images between similar but differently labeled samples, coupled with a learnable prompt to fine-tune a GPT-2 Medium LLM for report generation, trained with a joint objective including , , and . The approach yields a non-spurious visual representation learning framework and superior descriptive and clinical efficacy metrics on IU-Xray and MIMIC-CXR. The work advances interpretable, clinically aligned report generation and demonstrates the practical impact of counterfactual reasoning in medical vision-language tasks.

Abstract

Due to the common content of anatomy, radiology images with their corresponding reports exhibit high similarity. Such inherent data bias can predispose automatic report generation models to learn entangled and spurious representations resulting in misdiagnostic reports. To tackle these, we propose a novel \textbf{Co}unter\textbf{F}actual \textbf{E}xplanations-based framework (CoFE) for radiology report generation. Counterfactual explanations serve as a potent tool for understanding how decisions made by algorithms can be changed by asking ``what if'' scenarios. By leveraging this concept, CoFE can learn non-spurious visual representations by contrasting the representations between factual and counterfactual images. Specifically, we derive counterfactual images by swapping a patch between positive and negative samples until a predicted diagnosis shift occurs. Here, positive and negative samples are the most semantically similar but have different diagnosis labels. Additionally, CoFE employs a learnable prompt to efficiently fine-tune the pre-trained large language model, encapsulating both factual and counterfactual content to provide a more generalizable prompt representation. Extensive experiments on two benchmarks demonstrate that leveraging the counterfactual explanations enables CoFE to generate semantically coherent and factually complete reports and outperform in terms of language generation and clinical efficacy metrics.
Paper Structure (17 sections, 7 equations, 6 figures, 3 tables)

This paper contains 17 sections, 7 equations, 6 figures, 3 tables.

Figures (6)

  • Figure 1: A conceptual overview of our proposed counterfactual explanations is presented. Such CEs help to construct a counterfactual image by iteratively exchanging a patch between factual (positive) and negative images until the predicted diagnosis shift occurs. In this instance, the box in red covering the heart is identified as the critical region that causes the diagnosis shift.
  • Figure 2: Illustration of our proposed CounterFactual Explanations-based framework (CoFE). CoFE consists of two unimodal encoders, one cross-modal encoder, one language decoder, and our proposed counterfactual generation module that can construct a counterfactual image and a learnable prompt, respectively. The entire framework is trained through joint optimization, mainly employing contrastive learning paradigms for radiology report generation.
  • Figure 3: Illustration of negative sampling strategy. The objective is to select a negative sample that is mostly similar in semantics but carries a different diagnostic label from the data bank.
  • Figure 4: Illustration of the counterfactual generation process, including a counterfactual image and a learnable prompt.
  • Figure 5: Illustration of reports generated by R2Gen, DCL and our CoFE. The text in blue demonstrates the ground truth diagnosis labels. The red text represents the accurately matched abnormalities.
  • ...and 1 more figures