Table of Contents
Fetching ...

Eliminating the Language Bias for Visual Question Answering with fine-grained Causal Intervention

Ying Liu, Ge Bai, Chenji Lu, Shilong Li, Zhang Zhang, Ruifang Liu, Wenbin Guo

TL;DR

A novel causal intervention training scheme named CIBi is proposed to eliminate language bias from a finer-grained perspective and employ causal intervention and contrastive learning to eliminate context bias and improve the multi-modal representation.

Abstract

Despite the remarkable advancements in Visual Question Answering (VQA), the challenge of mitigating the language bias introduced by textual information remains unresolved. Previous approaches capture language bias from a coarse-grained perspective. However, the finer-grained information within a sentence, such as context and keywords, can result in different biases. Due to the ignorance of fine-grained information, most existing methods fail to sufficiently capture language bias. In this paper, we propose a novel causal intervention training scheme named CIBi to eliminate language bias from a finer-grained perspective. Specifically, we divide the language bias into context bias and keyword bias. We employ causal intervention and contrastive learning to eliminate context bias and improve the multi-modal representation. Additionally, we design a new question-only branch based on counterfactual generation to distill and eliminate keyword bias. Experimental results illustrate that CIBi is applicable to various VQA models, yielding competitive performance.

Eliminating the Language Bias for Visual Question Answering with fine-grained Causal Intervention

TL;DR

A novel causal intervention training scheme named CIBi is proposed to eliminate language bias from a finer-grained perspective and employ causal intervention and contrastive learning to eliminate context bias and improve the multi-modal representation.

Abstract

Despite the remarkable advancements in Visual Question Answering (VQA), the challenge of mitigating the language bias introduced by textual information remains unresolved. Previous approaches capture language bias from a coarse-grained perspective. However, the finer-grained information within a sentence, such as context and keywords, can result in different biases. Due to the ignorance of fine-grained information, most existing methods fail to sufficiently capture language bias. In this paper, we propose a novel causal intervention training scheme named CIBi to eliminate language bias from a finer-grained perspective. Specifically, we divide the language bias into context bias and keyword bias. We employ causal intervention and contrastive learning to eliminate context bias and improve the multi-modal representation. Additionally, we design a new question-only branch based on counterfactual generation to distill and eliminate keyword bias. Experimental results illustrate that CIBi is applicable to various VQA models, yielding competitive performance.

Paper Structure

This paper contains 17 sections, 10 equations, 4 figures, 2 tables.

Figures (4)

  • Figure 1: Causal graphs of VQA model.
  • Figure 2: An overview of CIBi. (a) shows the architecture of a base VQA model. (b) illustrates the training scheme of CIBi. (c) shows our cause-effect look at language bias in VQA.
  • Figure 3: Sensitivity of VQA accuracy on VQA-CP v2 test.
  • Figure 4: The answer distributions on the VQA-CP v2. CIBi is based on RUBi.