Table of Contents
Fetching ...

ConfliBERT: A Language Model for Political Conflict

Patrick T. Brandt, Sultan Alsarra, Vito J. D`Orazio, Dagmar Heintze, Latifur Khan, Shreyas Meher, Javier Osorio, Marcus Sianan

TL;DR

ConfliBERT demonstrates that a domain-specific BERT model trained on conflict and political violence data can outperform larger general LLMs on key information-extraction tasks in political conflict texts. By tackling binary relevance, multi-class event typing, and named-entity recognition within a unified framework, it reduces annotation burden and enables faster, more accurate processing of large corpora. Across BBC, re3d, and GTD datasets, ConfliBERT delivers superior accuracy, robustness, and computational efficiency, with open-source availability and multilingual variants. These results suggest significant practical value for political science research, real-time conflict monitoring, and policy analysis, while highlighting opportunities for ontology extension and continual learning. The work underscores the advantage of domain-informed NLP for structured event data construction in international relations and conflict studies.

Abstract

Conflict scholars have used rule-based approaches to extract information about political violence from news reports and texts. Recent Natural Language Processing developments move beyond rigid rule-based approaches. We review our recent ConfliBERT language model (Hu et al. 2022) to process political and violence related texts. The model can be used to extract actor and action classifications from texts about political conflict. When fine-tuned, results show that ConfliBERT has superior performance in accuracy, precision and recall over other large language models (LLM) like Google's Gemma 2 (9B), Meta's Llama 3.1 (7B), and Alibaba's Qwen 2.5 (14B) within its relevant domains. It is also hundreds of times faster than these more generalist LLMs. These results are illustrated using texts from the BBC, re3d, and the Global Terrorism Dataset (GTD).

ConfliBERT: A Language Model for Political Conflict

TL;DR

ConfliBERT demonstrates that a domain-specific BERT model trained on conflict and political violence data can outperform larger general LLMs on key information-extraction tasks in political conflict texts. By tackling binary relevance, multi-class event typing, and named-entity recognition within a unified framework, it reduces annotation burden and enables faster, more accurate processing of large corpora. Across BBC, re3d, and GTD datasets, ConfliBERT delivers superior accuracy, robustness, and computational efficiency, with open-source availability and multilingual variants. These results suggest significant practical value for political science research, real-time conflict monitoring, and policy analysis, while highlighting opportunities for ontology extension and continual learning. The work underscores the advantage of domain-informed NLP for structured event data construction in international relations and conflict studies.

Abstract

Conflict scholars have used rule-based approaches to extract information about political violence from news reports and texts. Recent Natural Language Processing developments move beyond rigid rule-based approaches. We review our recent ConfliBERT language model (Hu et al. 2022) to process political and violence related texts. The model can be used to extract actor and action classifications from texts about political conflict. When fine-tuned, results show that ConfliBERT has superior performance in accuracy, precision and recall over other large language models (LLM) like Google's Gemma 2 (9B), Meta's Llama 3.1 (7B), and Alibaba's Qwen 2.5 (14B) within its relevant domains. It is also hundreds of times faster than these more generalist LLMs. These results are illustrated using texts from the BBC, re3d, and the Global Terrorism Dataset (GTD).

Paper Structure

This paper contains 14 sections, 4 figures, 5 tables.

Figures (4)

  • Figure 1: ROC and AUC for each LLM and event type. Curves along the northwestern edge are better.
  • Figure 2: Precision-recall Curves for each LLM and event type. Curves along the northeastern edge are better.
  • Figure 3: F-scores across cutoffs for each event type model. Higher curves are better.
  • Figure 4: Cumulative number of predicted events, 2017--2021 by type and model