Table of Contents
Fetching ...

MedCFVQA: A Causal Approach to Mitigate Modality Preference Bias in Medical Visual Question Answering

Shuchang Ye, Usman Naseem, Mingyuan Meng, Dagan Feng, Jinman Kim

TL;DR

This work addresses modality preference bias in Medical Visual Question Answering (MedVQA), where models rely on question-priors rather than visual content. It introduces MedCFVQA, a counterfactual-inference framework that uses causal graphs to subtract bias during inference, ensuring predictions reflect true multimodal knowledge. To benchmark bias handling, the authors construct SLAKE-CP and RadVQA-CP via greedy re-splitting, creating train-test splits with divergent answer distributions. Experimental results show MedCFVQA outperforms non-causal MedVQA across SLAKE, RadVQA, and the CP variants, with strong gains on CP benchmarks and concrete qualitative debiasing demonstrations, signifying improved reliability for clinical VQA scenarios.

Abstract

Medical Visual Question Answering (MedVQA) is crucial for enhancing the efficiency of clinical diagnosis by providing accurate and timely responses to clinicians' inquiries regarding medical images. Existing MedVQA models suffered from modality preference bias, where predictions are heavily dominated by one modality while overlooking the other (in MedVQA, usually questions dominate the answer but images are overlooked), thereby failing to learn multimodal knowledge. To overcome the modality preference bias, we proposed a Medical CounterFactual VQA (MedCFVQA) model, which trains with bias and leverages causal graphs to eliminate the modality preference bias during inference. Existing MedVQA datasets exhibit substantial prior dependencies between questions and answers, which results in acceptable performance even if the model significantly suffers from the modality preference bias. To address this issue, we reconstructed new datasets by leveraging existing MedVQA datasets and Changed their P3rior dependencies (CP) between questions and their answers in the training and test set. Extensive experiments demonstrate that MedCFVQA significantly outperforms its non-causal counterpart on both SLAKE, RadVQA and SLAKE-CP, RadVQA-CP datasets.

MedCFVQA: A Causal Approach to Mitigate Modality Preference Bias in Medical Visual Question Answering

TL;DR

This work addresses modality preference bias in Medical Visual Question Answering (MedVQA), where models rely on question-priors rather than visual content. It introduces MedCFVQA, a counterfactual-inference framework that uses causal graphs to subtract bias during inference, ensuring predictions reflect true multimodal knowledge. To benchmark bias handling, the authors construct SLAKE-CP and RadVQA-CP via greedy re-splitting, creating train-test splits with divergent answer distributions. Experimental results show MedCFVQA outperforms non-causal MedVQA across SLAKE, RadVQA, and the CP variants, with strong gains on CP benchmarks and concrete qualitative debiasing demonstrations, signifying improved reliability for clinical VQA scenarios.

Abstract

Medical Visual Question Answering (MedVQA) is crucial for enhancing the efficiency of clinical diagnosis by providing accurate and timely responses to clinicians' inquiries regarding medical images. Existing MedVQA models suffered from modality preference bias, where predictions are heavily dominated by one modality while overlooking the other (in MedVQA, usually questions dominate the answer but images are overlooked), thereby failing to learn multimodal knowledge. To overcome the modality preference bias, we proposed a Medical CounterFactual VQA (MedCFVQA) model, which trains with bias and leverages causal graphs to eliminate the modality preference bias during inference. Existing MedVQA datasets exhibit substantial prior dependencies between questions and answers, which results in acceptable performance even if the model significantly suffers from the modality preference bias. To address this issue, we reconstructed new datasets by leveraging existing MedVQA datasets and Changed their P3rior dependencies (CP) between questions and their answers in the training and test set. Extensive experiments demonstrate that MedCFVQA significantly outperforms its non-causal counterpart on both SLAKE, RadVQA and SLAKE-CP, RadVQA-CP datasets.

Paper Structure

This paper contains 8 sections, 3 figures, 1 table, 1 algorithm.

Figures (3)

  • Figure 1: Integration of causal graph and neural network: The left panel illustrates the causal graph, while the right panel depicts the neural network architecture. An asterisk (*) indicates that the item is counterfactual. $x \rightarrow y$ represents the direct causal effect of $x$ on $y$, while $x \rightarrow m \rightarrow y$ represents the indirect causal effect of $x$ on $y$.
  • Figure 2: Visualization of data redistribution outcomes. The x-axis denotes the question type, while the bars indicate the proportion of answers corresponding to each question type.
  • Figure 3: Visualization of the MedCFVQA counterfactual inference process (biased predication - bias = debiased prediction) for mitigating modality preference bias.