MemeMind at ArAIEval Shared Task: Spotting Persuasive Spans in Arabic Text with Persuasion Techniques Identification
Md Rafiul Biswas, Zubair Shah, Wajdi Zaghouani
TL;DR
This work tackles the problem of spotting propagandistic spans and identifying persuasion techniques in Arabic text, focusing on tweets and news paragraphs from the ArAIEval dataset. It proposes a transformer-based pipeline using AraBERT-base with a token-classification head and a two-phase fine-tuning regimen, achieving a $F1$ score of $0.2774$ and securing 3rd place on the shared task leaderboard. The approach demonstrates the practicality of fine-grained, token-level propaganda detection in Arabic and provides a reproducible framework, along with comparative model analyses. Overall, the paper advances Arabic NLP for misinformation detection by validating a strong, GPU-efficient baseline and outlining concrete directions for improvement along dialectal coverage and feature augmentation.
Abstract
This paper focuses on detecting propagandistic spans and persuasion techniques in Arabic text from tweets and news paragraphs. Each entry in the dataset contains a text sample and corresponding labels that indicate the start and end positions of propaganda techniques within the text. Tokens falling within a labeled span were assigned "B" (Begin) or "I" (Inside), "O", corresponding to the specific propaganda technique. Using attention masks, we created uniform lengths for each span and assigned BIO tags to each token based on the provided labels. Then, we used AraBERT-base pre-trained model for Arabic text tokenization and embeddings with a token classification layer to identify propaganda techniques. Our training process involves a two-phase fine-tuning approach. First, we train only the classification layer for a few epochs, followed by full model fine-tuning, updating all parameters. This methodology allows the model to adapt to the specific characteristics of the propaganda detection task while leveraging the knowledge captured by the pre-trained AraBERT model. Our approach achieved an F1 score of 0.2774, securing the 3rd position in the leaderboard of Task 1.
