Table of Contents
Fetching ...

Bayesian Deep Count Regression and Anomaly Detection: Evidence from GDELT Event Panels

Hsin-Hsiung Huang, Yuh-Haur Chen, Mahlon Scott

Abstract

The Global Database of Events, Language and Tone (GDELT) provides geolocated event records that can be aggregated into weekly spatiotemporal panels of event counts across regions, actors, and event types. These panels are typically sparse, bursty, and overdispersed, so calibrated probabilistic forecasting is essential for monitoring rare surges. We propose Bayesian count regression pipelines that pair deterministic deep temporal encoders with negative binomial (NB2) and zero-inflated negative binomial (ZINB2) likelihood heads. Posterior predictive simulation yields predictive quantiles and right-tail probabilities that support both forecasting and anomaly scoring. For interpretable spillover attribution, we also fit a Bayesian generalised linear model with high-dimensional lagged cross-series predictors and a two-step screen-and-refit procedure under a three-parameter beta-normal (TPBN) shrinkage prior. To connect spillovers to directional statistics, active cross-region effects are mapped to geodesic bearings on the World Geodetic System 1984 ellipsoid (WGS84) and summarised using weighted circular moments, rose diagrams, and bearing-field maps. Simulations with known spillovers and conflict-panel case studies show accurate right-tail behaviour and a practical workflow for detecting and interpreting geopolitical shocks.

Bayesian Deep Count Regression and Anomaly Detection: Evidence from GDELT Event Panels

Abstract

The Global Database of Events, Language and Tone (GDELT) provides geolocated event records that can be aggregated into weekly spatiotemporal panels of event counts across regions, actors, and event types. These panels are typically sparse, bursty, and overdispersed, so calibrated probabilistic forecasting is essential for monitoring rare surges. We propose Bayesian count regression pipelines that pair deterministic deep temporal encoders with negative binomial (NB2) and zero-inflated negative binomial (ZINB2) likelihood heads. Posterior predictive simulation yields predictive quantiles and right-tail probabilities that support both forecasting and anomaly scoring. For interpretable spillover attribution, we also fit a Bayesian generalised linear model with high-dimensional lagged cross-series predictors and a two-step screen-and-refit procedure under a three-parameter beta-normal (TPBN) shrinkage prior. To connect spillovers to directional statistics, active cross-region effects are mapped to geodesic bearings on the World Geodetic System 1984 ellipsoid (WGS84) and summarised using weighted circular moments, rose diagrams, and bearing-field maps. Simulations with known spillovers and conflict-panel case studies show accurate right-tail behaviour and a practical workflow for detecting and interpreting geopolitical shocks.

Paper Structure

This paper contains 46 sections, 42 equations, 8 figures, 3 tables, 4 algorithms.

Figures (8)

  • Figure 1: GDELT events by Action geolocation in February 2016, colored by Actor 1 country. Spatial regions are shown in black. Imagery source: Esri World Imagery.
  • Figure 2: Modeling workflow. All pipelines share the same count likelihood (NB2 or ZINB2). The sparse GLM uses TPBN shrinkage and a Two-Step refit for interpretable cross-series spillovers, which are mapped to geodesic bearings for directional summaries. The hybrid pipelines replace high-dimensional cross-series predictors with learned embeddings from TFT or TSMixer and fit a Bayesian likelihood head for calibrated posterior predictive distributions.
  • Figure 3: Out-of-sample predictive performance on the 50-step test set for Dense (left column) and Sparse (right column) targets. Shaded regions are rolling 95% posterior predictive intervals and solid lines are posterior predictive medians. True observations are plotted as points. Observations exceeding the 97.5% upper predictive boundary are highlighted to denote right-tail exceedances. The y-axis scales are unified within each column to facilitate direct comparisons of interval widths across models.
  • Figure 4: One-step-ahead posterior predictive summaries for Israel-initiated Fight events (CAMEO 19) in the target grid cell during the test period. The solid line is the posterior predictive median and the shaded band is the 95% posterior predictive interval. Points are observed weekly event counts. Points above the 97.5% predictive upper bound are highlighted. The vertical reference line marks October 2023.
  • Figure 5: Geographic map of selected lag-1 spillover sources for Israeli Assault (CAMEO 18) and Fight (CAMEO 19) targets in the Israel-Palestine panel. Points mark source cell centroids. Point size is proportional to the absolute value of the posterior mean cross-series coefficient. Arrows indicate geodesic bearings from each source centroid to the target centroid. Arrow colour indicates the sign of the posterior mean effect.
  • ...and 3 more figures