Nullpointer at ArAIEval Shared Task: Arabic Propagandist Technique Detection with Token-to-Word Mapping in Sequence Tagging

Abrar Abir; Kemal Oflazer

Nullpointer at ArAIEval Shared Task: Arabic Propagandist Technique Detection with Token-to-Word Mapping in Sequence Tagging

Abrar Abir, Kemal Oflazer

TL;DR

The paper tackles Arabic propaganda technique detection in multi-genre text by fine-tuning AraBERT v2 with a neural network classifier for sequence tagging. It systematically compares token-level and word-level tagging strategies, finding that word-level prediction using the first token of each word, combined with genre encoding (tweet vs news), yields the strongest performance. A robust preprocessing pipeline addresses Unicode, misaligned spans, and user mentions to ensure clean annotations. The final model, trained on the merged training and development data, achieves state-of-the-art-like performance (up to 26.68 on the leaderboard) and demonstrates the value of token-to-word mapping and contextual genre information for Arabic propaganda detection.

Abstract

This paper investigates the optimization of propaganda technique detection in Arabic text, including tweets \& news paragraphs, from ArAIEval shared task 1. Our approach involves fine-tuning the AraBERT v2 model with a neural network classifier for sequence tagging. Experimental results show relying on the first token of the word for technique prediction produces the best performance. In addition, incorporating genre information as a feature further enhances the model's performance. Our system achieved a score of 25.41, placing us 4$^{th}$ on the leaderboard. Subsequent post-submission improvements further raised our score to 26.68.

Nullpointer at ArAIEval Shared Task: Arabic Propagandist Technique Detection with Token-to-Word Mapping in Sequence Tagging

TL;DR

Abstract

on the leaderboard. Subsequent post-submission improvements further raised our score to 26.68.

Paper Structure (26 sections, 2 figures, 2 tables)

This paper contains 26 sections, 2 figures, 2 tables.

Introduction
Related Work
Preprocessing the Data
Dataset
Removal of Unicode Control Characters
Handling Misaligned Span Annotations
Normalizing User Mentions
Manual Handling of Failed Substring Searches
Sequence Tagging Prediction Approaches
Token-Level Training and Prediction
Token-Level Training with Word-Level Prediction
Majority Label Assignment
First Token Label Assignment
Word-Level Training and Prediction
The Propagandistic Technique Detection System
...and 11 more sections

Figures (2)

Figure 1: Diagram of the (a) Classifier, (b) Sequence Tagging System
Figure :

Nullpointer at ArAIEval Shared Task: Arabic Propagandist Technique Detection with Token-to-Word Mapping in Sequence Tagging

TL;DR

Abstract

Nullpointer at ArAIEval Shared Task: Arabic Propagandist Technique Detection with Token-to-Word Mapping in Sequence Tagging

Authors

TL;DR

Abstract

Table of Contents

Figures (2)