SAFE-MEME: Structured Reasoning Framework for Robust Hate Speech Detection in Memes
Palash Nandi, Shivam Sharma, Tanmoy Chakraborty
TL;DR
SAFE-MEME tackles robust hate-speech detection in memes by introducing structured multimodal reasoning. It presents two variants: SAFE-MEME-QA, a Q&A-style MM-CoT approach, and SAFE-MEME-H, a hierarchical description-plus-classification method. The authors created two datasets, MHS and MHS-Con, to benchmark fine-grained and stress-test scenarios. Empirical results show that SAFE-MEME-QA and SAFE-MEME-H outperform multiple baselines on both datasets, with gains up to around 6 percentage points in F1 and provide insights into robustness and error patterns. The work highlights the potential of structured reasoning to improve multimodal hate-speech detection, while acknowledging dataset biases and ethical considerations.
Abstract
Memes act as cryptic tools for sharing sensitive ideas, often requiring contextual knowledge to interpret. This makes moderating multimodal memes challenging, as existing works either lack high-quality datasets on nuanced hate categories or rely on low-quality social media visuals. Here, we curate two novel multimodal hate speech datasets, MHS and MHS-Con, that capture fine-grained hateful abstractions in regular and confounding scenarios, respectively. We benchmark these datasets against several competing baselines. Furthermore, we introduce SAFE-MEME (Structured reAsoning FramEwork), a novel multimodal Chain-of-Thought-based framework employing Q&A-style reasoning (SAFE-MEME-QA) and hierarchical categorization (SAFE-MEME-H) to enable robust hate speech detection in memes. SAFE-MEME-QA outperforms existing baselines, achieving an average improvement of approximately 5% and 4% on MHS and MHS-Con, respectively. In comparison, SAFE-MEME-H achieves an average improvement of 6% in MHS while outperforming only multimodal baselines in MHS-Con. We show that fine-tuning a single-layer adapter within SAFE-MEME-H outperforms fully fine-tuned models in regular fine-grained hateful meme detection. However, the fully fine-tuning approach with a Q&A setup is more effective for handling confounding cases. We also systematically examine the error cases, offering valuable insights into the robustness and limitations of the proposed structured reasoning framework for analyzing hateful memes.
