Table of Contents
Fetching ...

Seeing Through VisualBERT: A Causal Adventure on Memetic Landscapes

Dibyanayan Bandyopadhyay, Mohammed Hasanuzzaman, Asif Ekbal

TL;DR

Surprisingly, it is found that input attribution methods do not guarantee causality within the SCM framework, raising questions about their reliability in safety-critical applications.

Abstract

Detecting offensive memes is crucial, yet standard deep neural network systems often remain opaque. Various input attribution-based methods attempt to interpret their behavior, but they face challenges with implicitly offensive memes and non-causal attributions. To address these issues, we propose a framework based on a Structural Causal Model (SCM). In this framework, VisualBERT is trained to predict the class of an input meme based on both meme input and causal concepts, allowing for transparent interpretation. Our qualitative evaluation demonstrates the framework's effectiveness in understanding model behavior, particularly in determining whether the model was right due to the right reason, and in identifying reasons behind misclassification. Additionally, quantitative analysis assesses the significance of proposed modelling choices, such as de-confounding, adversarial learning, and dynamic routing, and compares them with input attribution methods. Surprisingly, we find that input attribution methods do not guarantee causality within our framework, raising questions about their reliability in safety-critical applications. The project page is at: https://newcodevelop.github.io/causality_adventure/

Seeing Through VisualBERT: A Causal Adventure on Memetic Landscapes

TL;DR

Surprisingly, it is found that input attribution methods do not guarantee causality within the SCM framework, raising questions about their reliability in safety-critical applications.

Abstract

Detecting offensive memes is crucial, yet standard deep neural network systems often remain opaque. Various input attribution-based methods attempt to interpret their behavior, but they face challenges with implicitly offensive memes and non-causal attributions. To address these issues, we propose a framework based on a Structural Causal Model (SCM). In this framework, VisualBERT is trained to predict the class of an input meme based on both meme input and causal concepts, allowing for transparent interpretation. Our qualitative evaluation demonstrates the framework's effectiveness in understanding model behavior, particularly in determining whether the model was right due to the right reason, and in identifying reasons behind misclassification. Additionally, quantitative analysis assesses the significance of proposed modelling choices, such as de-confounding, adversarial learning, and dynamic routing, and compares them with input attribution methods. Surprisingly, we find that input attribution methods do not guarantee causality within our framework, raising questions about their reliability in safety-critical applications. The project page is at: https://newcodevelop.github.io/causality_adventure/

Paper Structure

This paper contains 35 sections, 4 equations, 9 figures, 4 tables.

Figures (9)

  • Figure 1: The Underlying notion of these memes is racism, which is implicit. None of the input attribution methods could decipher the notion of racism solely through input attribution.
  • Figure 2: Left: The causal process is illustrated by a SCM. Right: Causal intervention selectively intervenes on a concept $c_i$ to nullify its effect on the model. This generates the intermediate counterfactual representation $I^{CF_i}$. To measure the causal effect of concept $c_i$, we take the Individual treatment effect (ITE) as $|y'-y|$. Dotted blue lines denote that meme content representations ($t$,$v$) (along with $E_1$) generate causal concepts $c_i$. X demonstrates causal intervention, i.e. breaking the link between ($t$,$v$) and $c_i$, realized by setting $w_i=0$.
  • Figure 3: Model architecture comprising of VisualBERT, a dynamic routing layer and a gradient reversal layer. 0/1: non-offensive/offensive.
  • Figure 4: Comparison of mean $\widehat{RITE}$ score between w/ de-confounding and w/o de-confounding strategies
  • Figure 5: Memes of Table \ref{['tab:error-analysis']}.
  • ...and 4 more figures