Event-Arguments Extraction Corpus and Modeling using BERT for Arabic
Alaa Aljabari, Lina Duaibes, Mustafa Jarrar, Mohammed Khalilia
TL;DR
This work tackles Arabic event-argument extraction by creating WojoodHadath, an annotated extension of the Wojood corpus with agent, location, and date arguments, totaling $2{,}588$ relations across $2{,}772$ event mentions. It frames EAE as a natural language inference task and builds HadathNLI, a premise-hypothesis dataset that enables a BERT-based EAE model achieving $F1=94.01\%$ on HadathNLI and strong cross-domain performance ($F1=83.59\%$) on WojoodOutOfDomain. An end-to-end EAE system is implemented in SinaTools, combining a named-entity recognizer, template-driven hypothesis construction, and the EAE NLI classifier, with Ablation studies guiding template and loss choices. The paper also introduces WojoodOutOfDomain to test generalization across ten domains and demonstrates substantial improvements over baseline configurations. Overall, the work delivers new Arabic EAE resources, a novel NLI-based methodology, and an end-to-end system, with public release to support further research and applications in knowledge graph construction and information extraction.
Abstract
Event-argument extraction is a challenging task, particularly in Arabic due to sparse linguistic resources. To fill this gap, we introduce the \hadath corpus ($550$k tokens) as an extension of Wojood, enriched with event-argument annotations. We used three types of event arguments: $agent$, $location$, and $date$, which we annotated as relation types. Our inter-annotator agreement evaluation resulted in $82.23\%$ $Kappa$ score and $87.2\%$ $F_1$-score. Additionally, we propose a novel method for event relation extraction using BERT, in which we treat the task as text entailment. This method achieves an $F_1$-score of $94.01\%$. To further evaluate the generalization of our proposed method, we collected and annotated another out-of-domain corpus (about $80$k tokens) called \testNLI and used it as a second test set, on which our approach achieved promising results ($83.59\%$ $F_1$-score). Last but not least, we propose an end-to-end system for event-arguments extraction. This system is implemented as part of SinaTools, and both corpora are publicly available at {\small \url{https://sina.birzeit.edu/wojood}}
