Table of Contents
Fetching ...

MemeMind: A Large-Scale Multimodal Dataset with Chain-of-Thought Reasoning for Harmful Meme Detection

Hexiang Gu, Qifan Yu, Yuan Liu, Zikang Li, Saihui Hou, Jian Zhao, Zhaofeng He

TL;DR

This work tackles the challenge of detecting harmful memes by introducing MemeMind, a large-scale multimodal dataset with Chain-of-Thought annotations and international-aligned taxonomy. It also proposes MemeGuard, a three-stage, reasoning-enhanced detector that leverages CoT data and GRPO-based reinforcement learning to improve both accuracy and interpretability across diverse meme types. Extensive experiments demonstrate that MemeGuard outperforms state-of-the-art multimodal approaches on MemeMind, establishing a robust benchmark and a reusable framework for future research in safe online content moderation. The dataset and method collectively advance interpretable, scalable harmful meme analysis with potential real-world impact on platform safety and content governance.

Abstract

As a multimodal medium combining images and text, memes frequently convey implicit harmful content through metaphors and humor, rendering the detection of harmful memes a complex and challenging task. Although recent studies have made progress in detection accuracy and interpretability, large-scale, high-quality datasets for harmful memes remain scarce, and current methods still struggle to capture implicit risks and nuanced semantics. Thus, we construct MemeMind, a large-scale harmful meme dataset. Aligned with the international standards and the context of internet, MemeMind provides detailed Chain-of-Thought (CoT) reasoning annotations to support fine-grained analysis of implicit intentions in memes. Based on this dataset, we further propose MemeGuard, a reasoning-oriented multimodal detection model that significantly improves both the accuracy of harmful meme detection and the interpretability of model decisions. Extensive experimental results demonstrate that MemeGuard outperforms existing state-of-the-art methods on the MemeMind dataset, establishing a solid foundation for future research in harmful meme detection.

MemeMind: A Large-Scale Multimodal Dataset with Chain-of-Thought Reasoning for Harmful Meme Detection

TL;DR

This work tackles the challenge of detecting harmful memes by introducing MemeMind, a large-scale multimodal dataset with Chain-of-Thought annotations and international-aligned taxonomy. It also proposes MemeGuard, a three-stage, reasoning-enhanced detector that leverages CoT data and GRPO-based reinforcement learning to improve both accuracy and interpretability across diverse meme types. Extensive experiments demonstrate that MemeGuard outperforms state-of-the-art multimodal approaches on MemeMind, establishing a robust benchmark and a reusable framework for future research in safe online content moderation. The dataset and method collectively advance interpretable, scalable harmful meme analysis with potential real-world impact on platform safety and content governance.

Abstract

As a multimodal medium combining images and text, memes frequently convey implicit harmful content through metaphors and humor, rendering the detection of harmful memes a complex and challenging task. Although recent studies have made progress in detection accuracy and interpretability, large-scale, high-quality datasets for harmful memes remain scarce, and current methods still struggle to capture implicit risks and nuanced semantics. Thus, we construct MemeMind, a large-scale harmful meme dataset. Aligned with the international standards and the context of internet, MemeMind provides detailed Chain-of-Thought (CoT) reasoning annotations to support fine-grained analysis of implicit intentions in memes. Based on this dataset, we further propose MemeGuard, a reasoning-oriented multimodal detection model that significantly improves both the accuracy of harmful meme detection and the interpretability of model decisions. Extensive experimental results demonstrate that MemeGuard outperforms existing state-of-the-art methods on the MemeMind dataset, establishing a solid foundation for future research in harmful meme detection.

Paper Structure

This paper contains 31 sections, 5 equations, 6 figures, 6 tables.

Figures (6)

  • Figure 1: Examples of categories of harmful memes. Images (a) to (e) are English memes, and (f) to (j) are Chinese memes, each corresponding to a specific type of harmful content: (a, f) Discrimination, (b, g) Offensive, (c, h) Violence, (d, i) Vulgar, (e, j) Dissatisfaction.
  • Figure 2: An annotated example from our dataset. The meme is harmful due to its offensive and discriminatory implications. CAPTION reflects the interpretation of the meme, while REASONING documents the deep analysis of the meme. Finally, JUDGEMENT provides the overall classification result: harmful.
  • Figure 3: Dataset Construction Process. We defined scientific standards for harmful meme identification, applied Chain-of-Thought (CoT) annotations simulating human reasoning, implemented multi-model cross-verification for consistency, and performed manual sampling to ensure dataset quality.
  • Figure 4: Illustration of MemeGuard. In the Visual Enhancement stage, the model is trained with caption data to enhance its visual understanding capability. In the Reasoning Alignment stage, Chain-of-Thought (CoT) annotations and binary labels are used to align the model’s reasoning patterns. In the Reasoning Enhancement stage, GRPO framework is applied using data from the previous stage, together with specifically designed reward functions, to further improve the model’s reasoning quality and classification performance.
  • Figure 5: Illustrates several annotated examples from our MemeMind dataset, covering both harmless and harmful cases, as well as different types of harmful content. These include: (b) Discrimination, (c) Dissatisfaction, (d) Vulgar, (e) Violence, (f) Offensive.
  • ...and 1 more figures