From Newswire to Nexus: Using text-based actor embeddings and transformer networks to forecast conflict dynamics
Mihai Croicu, Simon Polichinel von der Maase
TL;DR
The paper tackles forecasting conflict dynamics at the actor level by marrying text-based actor embeddings with transformer-based forecasting, addressing the gap in predicting escalation and de-escalation. It builds a hybrid training corpus by aligning a large Factiva newswire collection with UCDP GED event data, retrains a domain-specific encoder (ConfliBERT), and then fine-tunes larger language backbones (DeBERTa and Mistral) to predict a four-state dynamic at the dyad-month level. The methodology introduces two data-augmentation strategies—low-context digests and high-context (RAG) digests—to provide rich, context-laden inputs for forecasting, and employs a Gaussian-process-based target measure to capture dynamics. Results show that encoder-based DeBERTa methods outperform a strong baseline on nowcasting and 1-month forecasts, with diminishing gains at longer horizons, while Mistral underperforms, highlighting data-corpus biases and the need for more integrated, end-to-end approaches for robust long-horizon predictions.
Abstract
This study advances the field of conflict forecasting by using text-based actor embeddings with transformer models to predict dynamic changes in violent conflict patterns at the actor level. More specifically, we combine newswire texts with structured conflict event data and leverage recent advances in Natural Language Processing (NLP) techniques to forecast escalations and de-escalations among conflicting actors, such as governments, militias, separatist movements, and terrorists. This new approach accurately and promptly captures the inherently volatile patterns of violent conflicts, which existing methods have not been able to achieve. To create this framework, we began by curating and annotating a vast international newswire corpus, leveraging hand-labeled event data from the Uppsala Conflict Data Program. By using this hybrid dataset, our models can incorporate the textual context of news sources along with the precision and detail of structured event data. This combination enables us to make both dynamic and granular predictions about conflict developments. We validate our approach through rigorous back-testing against historical events, demonstrating superior out-of-sample predictive power. We find that our approach is quite effective in identifying and predicting phases of conflict escalation and de-escalation, surpassing the capabilities of traditional models. By focusing on actor interactions, our explicit goal is to provide actionable insights to policymakers, humanitarian organizations, and peacekeeping operations in order to enable targeted and effective intervention strategies.
