Table of Contents
Fetching ...

CEHA: A Dataset of Conflict Events in the Horn of Africa

Rui Bai, Di Lu, Shihao Ran, Elizabeth Olson, Hemank Lamba, Aoife Cahill, Joel Tetreault, Alex Jaimes

TL;DR

CEHA introduces a region-focused benchmark dataset for conflict events in the Horn of Africa, addressing gaps in fine-grained event-type labels. Built from ACLED and GDELT, CEHA contains 500 English event descriptions annotated for event-relevance and four region-specific event-types, with rigorous annotation guidelines and quality control. The authors establish baselines using supervised models (BERT, RoBERTa, T5) and prompt-based LLMs (Mixtral, Mistral, DBRX, GPT-4o, Llama3-70B) across binary relevance and multi-label classification, highlighting the advantages and limitations of each approach in low-resource settings. Findings show that large language models, especially with few-shot prompting, offer competitive performance, yet climate-related events remain challenging due to data sparsity, underscoring the need for broader resource development and multilingual expansion to support AI4SG in crisis-prone regions.

Abstract

Natural Language Processing (NLP) of news articles can play an important role in understanding the dynamics and causes of violent conflict. Despite the availability of datasets categorizing various conflict events, the existing labels often do not cover all of the fine-grained violent conflict event types relevant to areas like the Horn of Africa. In this paper, we introduce a new benchmark dataset Conflict Events in the Horn of Africa region (CEHA) and propose a new task for identifying violent conflict events using online resources with this dataset. The dataset consists of 500 English event descriptions regarding conflict events in the Horn of Africa region with fine-grained event-type definitions that emphasize the cause of the conflict. This dataset categorizes the key types of conflict risk according to specific areas required by stakeholders in the Humanitarian-Peace-Development Nexus. Additionally, we conduct extensive experiments on two tasks supported by this dataset: Event-relevance Classification and Event-type Classification. Our baseline models demonstrate the challenging nature of these tasks and the usefulness of our dataset for model evaluations in low-resource settings with limited number of training data.

CEHA: A Dataset of Conflict Events in the Horn of Africa

TL;DR

CEHA introduces a region-focused benchmark dataset for conflict events in the Horn of Africa, addressing gaps in fine-grained event-type labels. Built from ACLED and GDELT, CEHA contains 500 English event descriptions annotated for event-relevance and four region-specific event-types, with rigorous annotation guidelines and quality control. The authors establish baselines using supervised models (BERT, RoBERTa, T5) and prompt-based LLMs (Mixtral, Mistral, DBRX, GPT-4o, Llama3-70B) across binary relevance and multi-label classification, highlighting the advantages and limitations of each approach in low-resource settings. Findings show that large language models, especially with few-shot prompting, offer competitive performance, yet climate-related events remain challenging due to data sparsity, underscoring the need for broader resource development and multilingual expansion to support AI4SG in crisis-prone regions.

Abstract

Natural Language Processing (NLP) of news articles can play an important role in understanding the dynamics and causes of violent conflict. Despite the availability of datasets categorizing various conflict events, the existing labels often do not cover all of the fine-grained violent conflict event types relevant to areas like the Horn of Africa. In this paper, we introduce a new benchmark dataset Conflict Events in the Horn of Africa region (CEHA) and propose a new task for identifying violent conflict events using online resources with this dataset. The dataset consists of 500 English event descriptions regarding conflict events in the Horn of Africa region with fine-grained event-type definitions that emphasize the cause of the conflict. This dataset categorizes the key types of conflict risk according to specific areas required by stakeholders in the Humanitarian-Peace-Development Nexus. Additionally, we conduct extensive experiments on two tasks supported by this dataset: Event-relevance Classification and Event-type Classification. Our baseline models demonstrate the challenging nature of these tasks and the usefulness of our dataset for model evaluations in low-resource settings with limited number of training data.

Paper Structure

This paper contains 16 sections, 10 tables.