Deciphering Hate: Identifying Hateful Memes and Their Targets

Eftekhar Hossain; Omar Sharif; Mohammed Moshiul Hoque; Sarah M. Preum

Deciphering Hate: Identifying Hateful Memes and Their Targets

Eftekhar Hossain, Omar Sharif, Mohammed Moshiul Hoque, Sarah M. Preum

TL;DR

This work tackles the problem of hateful meme detection and target identification in Bengali, a low-resource language. It introduces the Bengali Hateful Memes (BHM) dataset with 7,148 memes and dual annotations (hateful vs not; target category), and proposes DORA, a Dual cO attention fRAmework that fuses visual and textual cues via VGAR and TGAR for improved multimodal reasoning. DORA outperforms nine baselines on two tasks, achieving up to a 13% macro $F1$-score gain, and demonstrates transferability to related Bengali and Hindi meme datasets, illustrating cross-language generalizability. The dataset and method together push forward hate-speech research in low-resource languages and offer resources for interventions and content-filtering in multimodal social media content.

Abstract

Internet memes have become a powerful means for individuals to express emotions, thoughts, and perspectives on social media. While often considered as a source of humor and entertainment, memes can also disseminate hateful content targeting individuals or communities. Most existing research focuses on the negative aspects of memes in high-resource languages, overlooking the distinctive challenges associated with low-resource languages like Bengali (also known as Bangla). Furthermore, while previous work on Bengali memes has focused on detecting hateful memes, there has been no work on detecting their targeted entities. To bridge this gap and facilitate research in this arena, we introduce a novel multimodal dataset for Bengali, BHM (Bengali Hateful Memes). The dataset consists of 7,148 memes with Bengali as well as code-mixed captions, tailored for two tasks: (i) detecting hateful memes, and (ii) detecting the social entities they target (i.e., Individual, Organization, Community, and Society). To solve these tasks, we propose DORA (Dual cO attention fRAmework), a multimodal deep neural network that systematically extracts the significant modality features from the memes and jointly evaluates them with the modality-specific features to understand the context better. Our experiments show that DORA is generalizable on other low-resource hateful meme datasets and outperforms several state-of-the-art rivaling baselines.

Deciphering Hate: Identifying Hateful Memes and Their Targets

TL;DR

-score gain, and demonstrates transferability to related Bengali and Hindi meme datasets, illustrating cross-language generalizability. The dataset and method together push forward hate-speech research in low-resource languages and offer resources for interventions and content-filtering in multimodal social media content.

Abstract

Paper Structure (27 sections, 7 figures, 7 tables)

This paper contains 27 sections, 7 figures, 7 tables.

Introduction
Related Work
BHM: A New Benchmark Dataset
Data Collection and Sampling
Dataset Annotation
Definition of Categories
Annotation Process
Dataset Statistics
Methodology
Feature Extractor
Dual Co-Attention
Experiments
Baselines
Unimodal Models
Multimodal Models
...and 12 more sections

Figures (7)

Figure 1: Example of hateful memes with associated targets. The first meme directly refers to a telecom organization as a bandit, and the second one deliberately attacks a religious community.
Figure 2: A simplified view of our proposed Dual Co-Attention Framework ( DORA). The upper block represents the visual feature extractor, and the lower block is the textual feature extractor. The Dual Co-Attention block takes encoded visual and textual representation and generates two attentive vectors: VGAR (Vision-guided attentive Representation) and TGAR (Text-guided Attentive Representation). Finally, our method generates a richer multimodal representation by concatenating the attentive vectors with the individual modality-specific features.
Figure 3: Example (a) and (b) shows the memes where DORA yields better predictions, and example (c) illustrates a wrongly classified sample. The symbol (✓) and (✗) indicates the correct and incorrect prediction, respectively.
Figure B.1: Few examples hateful memes targets from BHM dataset. The factors based on which the targets were decided (a) demean a person, (b) attack the sexual orientation of a community (BTS Fanbase), (c) state some organizations as Robbers, and (d) denigrate the people of a particular region.
Figure B.2: Distribution of data sources. Each cell represents the number and percentage of samples collected from the corresponding sources.
...and 2 more figures

Deciphering Hate: Identifying Hateful Memes and Their Targets

TL;DR

Abstract

Deciphering Hate: Identifying Hateful Memes and Their Targets

Authors

TL;DR

Abstract

Table of Contents

Figures (7)