Meme-ingful Analysis: Enhanced Understanding of Cyberbullying in Memes Through Multimodal Explanations

Prince Jha; Krishanu Maity; Raghav Jain; Apoorv Verma; Sriparna Saha; Pushpak Bhattacharyya

Meme-ingful Analysis: Enhanced Understanding of Cyberbullying in Memes Through Multimodal Explanations

Prince Jha, Krishanu Maity, Raghav Jain, Apoorv Verma, Sriparna Saha, Pushpak Bhattacharyya

TL;DR

This work tackles explainability in multimodal, code-mixed cyberbullying memes by introducing MultiBully-Ex and the MExCCM task, which require both textual rationales and visual evidence. It proposes a CLIP projection-based, shared-private multitask architecture with three components: a Cross-Modal Neck, a Vision-Informed Textual Seq2Seq model, and a Linguistically-Sensitive Visual Segmentation model, augmented by a loss-prioritization scheme. Empirical results show that multimodal, multitask models outperform single-task and unimodal baselines on both textual and visual explainability, with human evaluations indicating high relevance for generated rationales. This advances interpretable meme moderation by combining robust multimodal representations with targeted explainability, and points to future work on stereotype detection and cross-language generalization.

Abstract

Internet memes have gained significant influence in communicating political, psychological, and sociocultural ideas. While memes are often humorous, there has been a rise in the use of memes for trolling and cyberbullying. Although a wide variety of effective deep learning-based models have been developed for detecting offensive multimodal memes, only a few works have been done on explainability aspect. Recent laws like "right to explanations" of General Data Protection Regulation, have spurred research in developing interpretable models rather than only focusing on performance. Motivated by this, we introduce {\em MultiBully-Ex}, the first benchmark dataset for multimodal explanation from code-mixed cyberbullying memes. Here, both visual and textual modalities are highlighted to explain why a given meme is cyberbullying. A Contrastive Language-Image Pretraining (CLIP) projection-based multimodal shared-private multitask approach has been proposed for visual and textual explanation of a meme. Experimental results demonstrate that training with multimodal explanations improves performance in generating textual justifications and more accurately identifying the visual evidence supporting a decision with reliable performance improvements.

Meme-ingful Analysis: Enhanced Understanding of Cyberbullying in Memes Through Multimodal Explanations

TL;DR

Abstract

Paper Structure (30 sections, 9 equations, 6 figures, 3 tables)

This paper contains 30 sections, 9 equations, 6 figures, 3 tables.

Introduction
Related Works
Multimodal Bully Explanations Dataset (MultiBully-Ex)
Annotation training
Main Annotation
Methodology
CLIP Projection-Based Cross-Modal Neck
Gated Visual Projection
Gated Textual Projection
Vision-Informed Textual Seq2Seq Model
Linguistically Sensitive Visual Segmentation Model
Loss Prioritization
Results and Discussion
Quantitative analysis
Qualitative Analysis
...and 15 more sections

Figures (6)

Figure 1: Cyberbullying Explanation in memes. Here the aim is to highlight both the image and text as an explanation of why the given meme is a bully.
Figure 2: CLIP projection-based (CP) multimodal shared-private multitask architecture. The Vision-Informed Textual Seq2Seq model is represented by a pink dotted box. The Cross Modal Projection Neck is signified by a blue dotted box. The Linguistically Sensitive Visual Segmentation model is indicated by a red dotted box. Lx denotes number of transformer layers
Figure 3: Human annotation vs. proposed model's visual and textual explanations; Green highlights indicate an agreement between the human annotator and the model. Red highlighted tokens are predicted by models, not by human annotators.
Figure 4: Distribution for Length of Meme Text
Figure 5: Distribution for Length of Annotated Rationales
...and 1 more figures

Meme-ingful Analysis: Enhanced Understanding of Cyberbullying in Memes Through Multimodal Explanations

TL;DR

Abstract

Meme-ingful Analysis: Enhanced Understanding of Cyberbullying in Memes Through Multimodal Explanations

Authors

TL;DR

Abstract

Table of Contents

Figures (6)