Table of Contents
Fetching ...

Causal Intersectionality and Dual Form of Gradient Descent for Multimodal Analysis: a Case Study on Hateful Memes

Yosuke Miyanishi, Minh Le Nguyen

TL;DR

It is demonstrated that hateful meme detection can be viewed as an ATE estimation using intersectionality principles, and summarized gradient-based attention scores highlight distinct behaviors of three Transformer models.

Abstract

Amidst the rapid expansion of Machine Learning (ML) and Large Language Models (LLMs), understanding the semantics within their mechanisms is vital. Causal analyses define semantics, while gradient-based methods are essential to eXplainable AI (XAI), interpreting the model's 'black box'. Integrating these, we investigate how a model's mechanisms reveal its causal effect on evidence-based decision-making. Research indicates intersectionality - the combined impact of an individual's demographics - can be framed as an Average Treatment Effect (ATE). This paper demonstrates that hateful meme detection can be viewed as an ATE estimation using intersectionality principles, and summarized gradient-based attention scores highlight distinct behaviors of three Transformer models. We further reveal that LLM Llama-2 can discern the intersectional aspects of the detection through in-context learning and that the learning process could be explained via meta-gradient, a secondary form of gradient. In conclusion, this work furthers the dialogue on Causality and XAI. Our code is available online (see External Resources section).

Causal Intersectionality and Dual Form of Gradient Descent for Multimodal Analysis: a Case Study on Hateful Memes

TL;DR

It is demonstrated that hateful meme detection can be viewed as an ATE estimation using intersectionality principles, and summarized gradient-based attention scores highlight distinct behaviors of three Transformer models.

Abstract

Amidst the rapid expansion of Machine Learning (ML) and Large Language Models (LLMs), understanding the semantics within their mechanisms is vital. Causal analyses define semantics, while gradient-based methods are essential to eXplainable AI (XAI), interpreting the model's 'black box'. Integrating these, we investigate how a model's mechanisms reveal its causal effect on evidence-based decision-making. Research indicates intersectionality - the combined impact of an individual's demographics - can be framed as an Average Treatment Effect (ATE). This paper demonstrates that hateful meme detection can be viewed as an ATE estimation using intersectionality principles, and summarized gradient-based attention scores highlight distinct behaviors of three Transformer models. We further reveal that LLM Llama-2 can discern the intersectional aspects of the detection through in-context learning and that the learning process could be explained via meta-gradient, a secondary form of gradient. In conclusion, this work furthers the dialogue on Causality and XAI. Our code is available online (see External Resources section).
Paper Structure (56 sections, 13 equations, 13 figures, 7 tables)

This paper contains 56 sections, 13 equations, 13 figures, 7 tables.

Figures (13)

  • Figure 1: Visualization of a hateful meme and its corresponding confounders. (top) Meme samples and (bottom) their directed acyclic graph representation. (left) A hateful meme highlights cross-modal interactions between its image and text components that contribute to its hatefulness. (middle) The image benign confounder showcases original text and a non-hateful image, resulting in reduced cross-modal interactions and a benign classification. (right) The text benign confounder comprises an original image and non-hateful text. Note: The samples depicted are illustrative and do not exist in the dataset. ©Getty Images
  • Figure 2: A schematic overview of our proposed methodology. Rectangular boxes denote data or models, while circular shapes represent the processes involved.
  • Figure 3: Multimodal Intersectional Average Treatment Effect ($miATE$) across Oscar (O: left), UNITER (U; middle), and VisualBERT (V; right) models, contrasting the samples with original image confounders (org. image, cyan) and those with original text confounders (org. text, magenta).
  • Figure 4: $MIDAS$ for org. image (left) and org. text (right) samples featuring Oscar (top), UNITER (second row), VisualBERT (third row), and VisualBERT with text-only-pretrained encoder (bottom). From left to right, each graph displays $attr$ with no modality division, $MIDAS_{within\_text}$, $MIDAS_{within\_image}$, and $MIDAS_{cross\_modal}$.
  • Figure 5: Conceptual portrayal of hateful, text benign, and image benign samples derived from UNITER. $MIDAS$ reflects heightened $attr_{cross\_modal}$ (green), $attr_{within\_image}$ (red), or $attr_{within\_text}$ (blue) values. Both image and text inputs spotlight top-scored ROIs and tokens. The text is abbreviated for clarity.
  • ...and 8 more figures