Investigating Persuasion Techniques in Arabic: An Empirical Study Leveraging Large Language Models
Abdurahmman Alzahrani, Eyad Babkier, Faisal Yanbaawi, Firas Yanbaawi, Hassan Alhuzali
TL;DR
This study tackles the detection of persuasion techniques in Arabic social media using the ArAlEval dataset, framing the task as both presence-absence (binary) and type classification (multi-label with 24 techniques). It compares three learning paradigms—feature extraction with frozen PLMs, full fine-tuning of PLMs, and prompt engineering with GPT-3.5/4—across multiple Arabic PLMs (AraBERTV2, MARBERT, CAMeLBERT, GigaBERT). The key finding is that full fine-tuning, particularly with GigaBERT, achieves the best performance (Task1A: 0.865 F1-Micro; Task1B: 0.532 F1-Micro and 0.464 Jaccard), while few-shot prompting with GPT models shows potential but requires further refinement to reach parity with fine-tuned PLMs. The work provides actionable insights for Arabic NLP in misinformation detection and highlights promising directions for future research, including specialized loss functions for multi-label tasks and expanded exploration of few-shot prompting.
Abstract
In the current era of digital communication and widespread use of social media, it is crucial to develop an understanding of persuasive techniques employed in written text. This knowledge is essential for effectively discerning accurate information and making informed decisions. To address this need, this paper presents a comprehensive empirical study focused on identifying persuasive techniques in Arabic social media content. To achieve this objective, we utilize Pre-trained Language Models (PLMs) and leverage the ArAlEval dataset, which encompasses two tasks: binary classification to determine the presence or absence of persuasion techniques, and multi-label classification to identify the specific types of techniques employed in the text. Our study explores three different learning approaches by harnessing the power of PLMs: feature extraction, fine-tuning, and prompt engineering techniques. Through extensive experimentation, we find that the fine-tuning approach yields the highest results on the aforementioned dataset, achieving an f1-micro score of 0.865 and an f1-weighted score of 0.861. Furthermore, our analysis sheds light on an interesting finding. While the performance of the GPT model is relatively lower compared to the other approaches, we have observed that by employing few-shot learning techniques, we can enhance its results by up to 20\%. This offers promising directions for future research and exploration in this topic\footnote{Upon Acceptance, the source code will be released on GitHub.}.
