MemeGuard: An LLM and VLM-based Framework for Advancing Content Moderation via Meme Intervention

Prince Jha; Raghav Jain; Konika Mandal; Aman Chadha; Sriparna Saha; Pushpak Bhattacharyya

MemeGuard: An LLM and VLM-based Framework for Advancing Content Moderation via Meme Intervention

Prince Jha, Raghav Jain, Konika Mandal, Aman Chadha, Sriparna Saha, Pushpak Bhattacharyya

TL;DR

MemeGuard tackles proactive, multimodal meme intervention by grounding interventions in meme-specific context. It introduces a meme-focused Visual Language Model (VLMeme) and a Multimodal Knowledge Selection (MKS) mechanism to curate relevant contextual knowledge, which then guides a general-purpose LLM to generate interventions. The ICMM dataset provides high-quality, human-annotated interventions for toxic memes to benchmark progress. Empirical results show MemeGuard improves automatic and human-evaluated intervention quality across multiple LLMs, underscoring the value of domain-specific grounding and multimodal context for effective cyberbullying mitigation with memes.

Abstract

In the digital world, memes present a unique challenge for content moderation due to their potential to spread harmful content. Although detection methods have improved, proactive solutions such as intervention are still limited, with current research focusing mostly on text-based content, neglecting the widespread influence of multimodal content like memes. Addressing this gap, we present \textit{MemeGuard}, a comprehensive framework leveraging Large Language Models (LLMs) and Visual Language Models (VLMs) for meme intervention. \textit{MemeGuard} harnesses a specially fine-tuned VLM, \textit{VLMeme}, for meme interpretation, and a multimodal knowledge selection and ranking mechanism (\textit{MKS}) for distilling relevant knowledge. This knowledge is then employed by a general-purpose LLM to generate contextually appropriate interventions. Another key contribution of this work is the \textit{\textbf{I}ntervening} \textit{\textbf{C}yberbullying in \textbf{M}ultimodal \textbf{M}emes (ICMM)} dataset, a high-quality, labeled dataset featuring toxic memes and their corresponding human-annotated interventions. We leverage \textit{ICMM} to test \textit{MemeGuard}, demonstrating its proficiency in generating relevant and effective responses to toxic memes.

MemeGuard: An LLM and VLM-based Framework for Advancing Content Moderation via Meme Intervention

TL;DR

Abstract

Paper Structure (21 sections, 2 equations, 10 figures, 6 tables)

This paper contains 21 sections, 2 equations, 10 figures, 6 tables.

Introduction
Related Works
ICMM Dataset
Annotation Training
Main Annotation
Methodology
Toxic Meme Analysis Module
Multimodal Knowledge Selection (MKS)
Intervention Generation Module
Experimental Results and Discussion
Conclusion and Future Work
Appendix
Experimental Setup
Hyperparameters
Performance of VLMeme
...and 6 more sections

Figures (10)

Figure 1: An instance of the meme intervention task.
Figure 2: Flowchart depicting the annotation guideline illustrated with a sample example.
Figure 3: The proposed framework of our MemeGuard system. Sentences highlighted in green within the Generated Context block symbolize relevant knowledge, while those in red signify irrelevant knowledge.
Figure 4: Architectural Diagram of VLMeme.
Figure 5: BERTScore variation for GPT3.5-Turbo and FLAN-T5 with different Threshold values
...and 5 more figures

MemeGuard: An LLM and VLM-based Framework for Advancing Content Moderation via Meme Intervention

TL;DR

Abstract

MemeGuard: An LLM and VLM-based Framework for Advancing Content Moderation via Meme Intervention

Authors

TL;DR

Abstract

Table of Contents

Figures (10)