Event-Arguments Extraction Corpus and Modeling using BERT for Arabic

Alaa Aljabari; Lina Duaibes; Mustafa Jarrar; Mohammed Khalilia

Event-Arguments Extraction Corpus and Modeling using BERT for Arabic

Alaa Aljabari, Lina Duaibes, Mustafa Jarrar, Mohammed Khalilia

TL;DR

This work tackles Arabic event-argument extraction by creating WojoodHadath, an annotated extension of the Wojood corpus with agent, location, and date arguments, totaling $2{,}588$ relations across $2{,}772$ event mentions. It frames EAE as a natural language inference task and builds HadathNLI, a premise-hypothesis dataset that enables a BERT-based EAE model achieving $F1=94.01\%$ on HadathNLI and strong cross-domain performance ($F1=83.59\%$) on WojoodOutOfDomain. An end-to-end EAE system is implemented in SinaTools, combining a named-entity recognizer, template-driven hypothesis construction, and the EAE NLI classifier, with Ablation studies guiding template and loss choices. The paper also introduces WojoodOutOfDomain to test generalization across ten domains and demonstrates substantial improvements over baseline configurations. Overall, the work delivers new Arabic EAE resources, a novel NLI-based methodology, and an end-to-end system, with public release to support further research and applications in knowledge graph construction and information extraction.

Abstract

Event-argument extraction is a challenging task, particularly in Arabic due to sparse linguistic resources. To fill this gap, we introduce the \hadath corpus ($550$k tokens) as an extension of Wojood, enriched with event-argument annotations. We used three types of event arguments: $agent$, $location$, and $date$, which we annotated as relation types. Our inter-annotator agreement evaluation resulted in $82.23\%$ $Kappa$ score and $87.2\%$ $F_1$-score. Additionally, we propose a novel method for event relation extraction using BERT, in which we treat the task as text entailment. This method achieves an $F_1$-score of $94.01\%$. To further evaluate the generalization of our proposed method, we collected and annotated another out-of-domain corpus (about $80$k tokens) called \testNLI and used it as a second test set, on which our approach achieved promising results ($83.59\%$ $F_1$-score). Last but not least, we propose an end-to-end system for event-arguments extraction. This system is implemented as part of SinaTools, and both corpora are publicly available at {\small \url{https://sina.birzeit.edu/wojood}}

Event-Arguments Extraction Corpus and Modeling using BERT for Arabic

TL;DR

This work tackles Arabic event-argument extraction by creating WojoodHadath, an annotated extension of the Wojood corpus with agent, location, and date arguments, totaling

relations across

event mentions. It frames EAE as a natural language inference task and builds HadathNLI, a premise-hypothesis dataset that enables a BERT-based EAE model achieving

on HadathNLI and strong cross-domain performance (

) on WojoodOutOfDomain. An end-to-end EAE system is implemented in SinaTools, combining a named-entity recognizer, template-driven hypothesis construction, and the EAE NLI classifier, with Ablation studies guiding template and loss choices. The paper also introduces WojoodOutOfDomain to test generalization across ten domains and demonstrates substantial improvements over baseline configurations. Overall, the work delivers new Arabic EAE resources, a novel NLI-based methodology, and an end-to-end system, with public release to support further research and applications in knowledge graph construction and information extraction.

Abstract

Event-argument extraction is a challenging task, particularly in Arabic due to sparse linguistic resources. To fill this gap, we introduce the \hadath corpus (

k tokens) as an extension of Wojood, enriched with event-argument annotations. We used three types of event arguments:

, and

, which we annotated as relation types. Our inter-annotator agreement evaluation resulted in

score and

-score. Additionally, we propose a novel method for event relation extraction using BERT, in which we treat the task as text entailment. This method achieves an

-score of

. To further evaluate the generalization of our proposed method, we collected and annotated another out-of-domain corpus (about

k tokens) called \testNLI and used it as a second test set, on which our approach achieved promising results (

-score). Last but not least, we propose an end-to-end system for event-arguments extraction. This system is implemented as part of SinaTools, and both corpora are publicly available at {\small \url{https://sina.birzeit.edu/wojood}}

Paper Structure (28 sections, 7 equations, 4 figures, 13 tables)

This paper contains 28 sections, 7 equations, 4 figures, 13 tables.

Introduction
Related works
Dataset and annotation
Corpus Preparation
Annotation Process
Relationship Types
Annotation Guidelines
Corpus Statistics
Inter-annotator Agreement
Calculating $kappa$
Calculating $F_1$-score
Discussion and Annotation Challenges
Even-Argument Extraction (EAE)
Problem Formulation
Event Relation Extraction as NLI
...and 13 more sections

Figures (4)

Figure 1: An event annotated with its arguments.
Figure 2: Annotating an event entity with its arguments.
Figure 5: Framing the EAE task as NLI task.
Figure 6: End-To-End Event Argument Extraction Architecture.

Event-Arguments Extraction Corpus and Modeling using BERT for Arabic

TL;DR

Abstract

Event-Arguments Extraction Corpus and Modeling using BERT for Arabic

Authors

TL;DR

Abstract

Table of Contents

Figures (4)