WojoodNER 2024: The Second Arabic Named Entity Recognition Shared Task
Mustafa Jarrar, Nagham Hamad, Mohammed Khalilia, Bashar Talafha, AbdelRahim Elmadany, Muhammad Abdul-Mageed
TL;DR
WojoodNER-2024 advances Arabic NER by introducing WojoodFine, a fine-grained, nested annotation scheme with ~$550{,}000$ tokens and $51$ entity types, applied across MSA and two dialects. The paper defines three subtasks—Closed-Track Flat and Nested Fine-Grained NER, plus an Open-Track NER for the Gaza War—evaluating diverse approaches from transfer learning baselines to transformer- and LLM-enabled systems. It reports participation metrics (43 registered teams; 7 submissions) and top results: $F_1$ of $91\%$ (Flat), $92\%$ (Nested), and $73.7\%$ (Open Gaza), illustrating progress and remaining gaps in handling fine-grained and cross-domain NER. The work highlights methodological diversity (AraBERTv2 baselines, TANL-based pipelines, datastore-enhanced inference, and LLM-enabled Gaza submissions) and sets a path for expanding dialect coverage and data distributions to strengthen Arabic NER research in practical, multilingual scenarios.
Abstract
We present WojoodNER-2024, the second Arabic Named Entity Recognition (NER) Shared Task. In WojoodNER-2024, we focus on fine-grained Arabic NER. We provided participants with a new Arabic fine-grained NER dataset called wojoodfine, annotated with subtypes of entities. WojoodNER-2024 encompassed three subtasks: (i) Closed-Track Flat Fine-Grained NER, (ii) Closed-Track Nested Fine-Grained NER, and (iii) an Open-Track NER for the Israeli War on Gaza. A total of 43 unique teams registered for this shared task. Five teams participated in the Flat Fine-Grained Subtask, among which two teams tackled the Nested Fine-Grained Subtask and one team participated in the Open-Track NER Subtask. The winning teams achieved F-1 scores of 91% and 92% in the Flat Fine-Grained and Nested Fine-Grained Subtasks, respectively. The sole team in the Open-Track Subtask achieved an F-1 score of 73.7%.
