ConfliBERT: A Language Model for Political Conflict
Patrick T. Brandt, Sultan Alsarra, Vito J. D`Orazio, Dagmar Heintze, Latifur Khan, Shreyas Meher, Javier Osorio, Marcus Sianan
TL;DR
ConfliBERT demonstrates that a domain-specific BERT model trained on conflict and political violence data can outperform larger general LLMs on key information-extraction tasks in political conflict texts. By tackling binary relevance, multi-class event typing, and named-entity recognition within a unified framework, it reduces annotation burden and enables faster, more accurate processing of large corpora. Across BBC, re3d, and GTD datasets, ConfliBERT delivers superior accuracy, robustness, and computational efficiency, with open-source availability and multilingual variants. These results suggest significant practical value for political science research, real-time conflict monitoring, and policy analysis, while highlighting opportunities for ontology extension and continual learning. The work underscores the advantage of domain-informed NLP for structured event data construction in international relations and conflict studies.
Abstract
Conflict scholars have used rule-based approaches to extract information about political violence from news reports and texts. Recent Natural Language Processing developments move beyond rigid rule-based approaches. We review our recent ConfliBERT language model (Hu et al. 2022) to process political and violence related texts. The model can be used to extract actor and action classifications from texts about political conflict. When fine-tuned, results show that ConfliBERT has superior performance in accuracy, precision and recall over other large language models (LLM) like Google's Gemma 2 (9B), Meta's Llama 3.1 (7B), and Alibaba's Qwen 2.5 (14B) within its relevant domains. It is also hundreds of times faster than these more generalist LLMs. These results are illustrated using texts from the BBC, re3d, and the Global Terrorism Dataset (GTD).
