Table of Contents
Fetching ...

Unpacking Hateful Memes: Presupposed Context and False Claims

Weibin Cai, Jiayu Li, Reza Zafarani

TL;DR

This work identifies presupposed context and false claims as the core expressive mechanisms of hateful memes and builds SHIELD, a framework that unifies a Presupposed Context Module (PCM) with a False Claims Module (FACT). PCM encodes intra-modal context and fuses cross-modal cues to produce a context embedding, while FACT combines a Social Perception Module (SPM) leveraging external knowledge via a fine-tuned LLM and a Cross-modal Reference Module (CRM) that constructs a cross-modal reference graph processed by a GNN to yield a reference embedding. The classifier then concatenates PCM, SPM, and CRM representations to detect hate, with theoretical analysis of the reference graph’s discriminative properties. Empirically, SHIELD outperforms strong baselines on three hateful meme datasets and proves versatile by transferring to fake-news classification, demonstrating robust generalization across domains. The work offers a theory-grounded approach to hate detection that integrates philosophical and psychological insights with multimodal learning to address societal harms from hateful memes.

Abstract

While memes are often humorous, they are frequently used to disseminate hate, causing serious harm to individuals and society. Current approaches to hateful meme detection mainly rely on pre-trained language models. However, less focus has been dedicated to \textit{what make a meme hateful}. Drawing on insights from philosophy and psychology, we argue that hateful memes are characterized by two essential features: a \textbf{presupposed context} and the expression of \textbf{false claims}. To capture presupposed context, we develop \textbf{PCM} for modeling contextual information across modalities. To detect false claims, we introduce the \textbf{FACT} module, which integrates external knowledge and harnesses cross-modal reference graphs. By combining PCM and FACT, we introduce \textbf{\textsf{SHIELD}}, a hateful meme detection framework designed to capture the fundamental nature of hate. Extensive experiments show that SHIELD outperforms state-of-the-art methods across datasets and metrics, while demonstrating versatility on other tasks, such as fake news detection.

Unpacking Hateful Memes: Presupposed Context and False Claims

TL;DR

This work identifies presupposed context and false claims as the core expressive mechanisms of hateful memes and builds SHIELD, a framework that unifies a Presupposed Context Module (PCM) with a False Claims Module (FACT). PCM encodes intra-modal context and fuses cross-modal cues to produce a context embedding, while FACT combines a Social Perception Module (SPM) leveraging external knowledge via a fine-tuned LLM and a Cross-modal Reference Module (CRM) that constructs a cross-modal reference graph processed by a GNN to yield a reference embedding. The classifier then concatenates PCM, SPM, and CRM representations to detect hate, with theoretical analysis of the reference graph’s discriminative properties. Empirically, SHIELD outperforms strong baselines on three hateful meme datasets and proves versatile by transferring to fake-news classification, demonstrating robust generalization across domains. The work offers a theory-grounded approach to hate detection that integrates philosophical and psychological insights with multimodal learning to address societal harms from hateful memes.

Abstract

While memes are often humorous, they are frequently used to disseminate hate, causing serious harm to individuals and society. Current approaches to hateful meme detection mainly rely on pre-trained language models. However, less focus has been dedicated to \textit{what make a meme hateful}. Drawing on insights from philosophy and psychology, we argue that hateful memes are characterized by two essential features: a \textbf{presupposed context} and the expression of \textbf{false claims}. To capture presupposed context, we develop \textbf{PCM} for modeling contextual information across modalities. To detect false claims, we introduce the \textbf{FACT} module, which integrates external knowledge and harnesses cross-modal reference graphs. By combining PCM and FACT, we introduce \textbf{\textsf{SHIELD}}, a hateful meme detection framework designed to capture the fundamental nature of hate. Extensive experiments show that SHIELD outperforms state-of-the-art methods across datasets and metrics, while demonstrating versatility on other tasks, such as fake news detection.

Paper Structure

This paper contains 28 sections, 1 theorem, 10 equations, 4 figures, 8 tables.

Key Result

Theorem 4.1

The reference graph is discriminative if: where $\mathbf{e}_{v_0}$ is a one-hot vector indicating node $v_0$, $\delta = -2 h_{v_0}$, and $P = \prod_t W^{(t)}$ denotes the product of layer-wise GCN weights $W^{(t)}$.

Figures (4)

  • Figure 1: An example of a hateful meme. Meme text: good guy police officer, capturing them young.
  • Figure 2: Examples illustrating how memes express hate through presupposed context and false claims. Text in Figure \ref{['fig:example1']} and \ref{['fig:example2']}: "good guy police officer, capturing them young."Blue boxes mark elements reflecting the presupposed context; green boxes mark entities portrayed as "good", while red boxes denote those portrayed as "bad"; yellow boxes indicate referential links between text and image. In Figure \ref{['fig:example1']} and Figure \ref{['fig:example2']}, "capturing" reflects a presupposed evaluative context where the white police officer is framed as the "good guy", while "them young" refers to Black kid as the bad group. Figure \ref{['fig:example1']} also contains false claims: (1) Incorrectness---"capturing" contradicts the friendly handshake in the image; (2) Deliberately misleading---it perpetuates the stereotype that Black kids are inherently criminal. These elements make the meme hateful. In contrast, Figure \ref{['fig:example2']} lacks explicit false claims; the presupposed context alone is insufficient to classify it as hateful. Figure \ref{['fig:example6']}, though containing a false claim, i.e., "baked potatoes" for a volcanic eruption, lacks a presupposed context and is better seen as dark humor.
  • Figure 3: SHIELD Framework. Given a meme image and its text, we first obtain patch and token embeddings $H_v$ and $H_t$ from $IE(\cdot)$ and $TE(\cdot)$. (1) Input $H_v$ and $H_t$ into PCM, where image and text context encoders perform intra-modal interactions to extract modality-specific context embeddings. These are further fused via a context fusion module to produce $h_{PC}$, which contain context information of the meme. (2) Meanwhile, $H_v$ and $H_t$ and a prompt $\mathcal{P}$ are passed to the FACT module. The LLM, guided by $\mathcal{P}$, produces a last hidden state $h_{SP}$ and attention matrix. Based on this attention matrix, a cross-modal reference graph is constructed, which is subsequently processed by a GNN to generate the reference graph embedding $h_{CR}$. Here, flame icons indicate modules with trainable parameters, while snowflakes indicate frozen parameters.
  • Figure 4: Examples of two context types of hateful memes. In Figure \ref{['fig:example7']}, Islam is framed as the bad group. The words "violent" and "kill" are reinforced by the image, which portrays a defiant, confrontational crowd——amplifying the negative portrayal through visual alignment. In contrast, in Figure \ref{['fig:example8']} uses positive term "respect" to refer to an execution scene featuring nooses and guillotines. The opposing sentiment between text and image conveys ironic humor.

Theorems & Definitions (1)

  • Theorem 4.1