IITK at SemEval-2024 Task 4: Hierarchical Embeddings for Detection of Persuasion Techniques in Memes
Shreenaga Chikoti, Shrey Mehta, Ashutosh Modi
TL;DR
The paper tackles multilingual detection of persuasion techniques in memes (SemEval-2024 Task 4) by coupling HypEmo's hierarchical hyperbolic label embeddings with a class-definition prediction framework, and extending to multimodal inputs via CLIP. It deploys task-specific ensembles across three subtasks: text-only hierarchical multi-label, text+image hierarchical multi-label, and binary text+image detection, achieving hierarchical F1 scores of 0.60, 0.67, and 0.48 respectively. Key findings show that textual cues predominantly drive performance, CLIP-derived visual features offer limited gains except for certain techniques, and a union ensemble of models yields robust results. The work contributes a cohesive multimodal approach for persuasion-detection in memes and provides code for reproducibility.
Abstract
Memes are one of the most popular types of content used in an online disinformation campaign. They are primarily effective on social media platforms since they can easily reach many users. Memes in a disinformation campaign achieve their goal of influencing the users through several rhetorical and psychological techniques, such as causal oversimplification, name-calling, and smear. The SemEval 2024 Task 4 \textit{Multilingual Detection of Persuasion Technique in Memes} on identifying such techniques in the memes is divided across three sub-tasks: ($\mathbf{1}$) Hierarchical multi-label classification using only textual content of the meme, ($\mathbf{2}$) Hierarchical multi-label classification using both, textual and visual content of the meme and ($\mathbf{3}$) Binary classification of whether the meme contains a persuasion technique or not using it's textual and visual content. This paper proposes an ensemble of Class Definition Prediction (CDP) and hyperbolic embeddings-based approaches for this task. We enhance meme classification accuracy and comprehensiveness by integrating HypEmo's hierarchical label embeddings (Chen et al., 2023) and a multi-task learning framework for emotion prediction. We achieve a hierarchical F1-score of 0.60, 0.67, and 0.48 on the respective sub-tasks.
