MemeMind: A Large-Scale Multimodal Dataset with Chain-of-Thought Reasoning for Harmful Meme Detection
Hexiang Gu, Qifan Yu, Yuan Liu, Zikang Li, Saihui Hou, Jian Zhao, Zhaofeng He
TL;DR
This work tackles the challenge of detecting harmful memes by introducing MemeMind, a large-scale multimodal dataset with Chain-of-Thought annotations and international-aligned taxonomy. It also proposes MemeGuard, a three-stage, reasoning-enhanced detector that leverages CoT data and GRPO-based reinforcement learning to improve both accuracy and interpretability across diverse meme types. Extensive experiments demonstrate that MemeGuard outperforms state-of-the-art multimodal approaches on MemeMind, establishing a robust benchmark and a reusable framework for future research in safe online content moderation. The dataset and method collectively advance interpretable, scalable harmful meme analysis with potential real-world impact on platform safety and content governance.
Abstract
As a multimodal medium combining images and text, memes frequently convey implicit harmful content through metaphors and humor, rendering the detection of harmful memes a complex and challenging task. Although recent studies have made progress in detection accuracy and interpretability, large-scale, high-quality datasets for harmful memes remain scarce, and current methods still struggle to capture implicit risks and nuanced semantics. Thus, we construct MemeMind, a large-scale harmful meme dataset. Aligned with the international standards and the context of internet, MemeMind provides detailed Chain-of-Thought (CoT) reasoning annotations to support fine-grained analysis of implicit intentions in memes. Based on this dataset, we further propose MemeGuard, a reasoning-oriented multimodal detection model that significantly improves both the accuracy of harmful meme detection and the interpretability of model decisions. Extensive experimental results demonstrate that MemeGuard outperforms existing state-of-the-art methods on the MemeMind dataset, establishing a solid foundation for future research in harmful meme detection.
