MIMIC: Multimodal Islamophobic Meme Identification and Classification
S M Jishanul Islam, Sahid Hossain Mustakim, Sadia Ahmmed, Md. Faiyaz Abdullah Sayeedi, Swapnil Khandoker, Syed Tasdid Azam Dhrubo, Nahid Hossain
TL;DR
The study targets the detection of anti-Muslim hate in memes, a challenging multimodal problem. It introduces a dedicated dataset of 953 memes and a ViLT-based classifier that fuses image patches with OCR-extracted meme text to identify hateful content. Results show that the ViLT approach, especially with data augmentation, achieves strong generalization, achieving a maximum $F_1$-weighted score of $0.738$ under 10-fold cross-validation, and outperforms baselines like VisualBert and CLIP variants. The work highlights dataset size as a constraint and suggests expanding data and modalities to enhance real-world content moderation capabilities.
Abstract
Anti-Muslim hate speech has emerged within memes, characterized by context-dependent and rhetorical messages using text and images that seemingly mimic humor but convey Islamophobic sentiments. This work presents a novel dataset and proposes a classifier based on the Vision-and-Language Transformer (ViLT) specifically tailored to identify anti-Muslim hate within memes by integrating both visual and textual representations. Our model leverages joint modal embeddings between meme images and incorporated text to capture nuanced Islamophobic narratives that are unique to meme culture, providing both high detection accuracy and interoperability.
