GUARD: Constructing Realistic Two-Player Matrix and Security Games for Benchmarking Game-Theoretic Algorithms
Noah Krever, Jakub Černý, Moïse Blanchard, Christian Kroer
TL;DR
GUARD tackles the challenge of benchmarking game-theoretic algorithms on realistic security-inspired settings by generating data-driven two-player games from public sources (e.g., Movebank, OpenStreetMap, census data). It defines a three-tier framework (Graph Game, Security Game, Domain-Specific Games) with NFG and schedule-form variants, and offers a library of preconfigured instances (GSGs and ISGs) along with exportable formats for OpenSpiel and Gambit. The authors demonstrate theoretical limitations of random benchmarks and empirically show that realistic instances yield richer equilibria, more diverse supports, and more stable convergence across standard solvers, underscoring the need for realistic benchmarks in algorithmic game theory. The framework enables reproducible, domain-aligned benchmarking and provides ready-to-use data-driven instances that can inform security planning and policy-relevant research, while noting scalability and data fidelity constraints and suggesting future extensions to richer constraints and additional domains.
Abstract
Game-theoretic algorithms are commonly benchmarked on recreational games, classical constructs from economic theory such as congestion and dispersion games, or entirely random game instances. While the past two decades have seen the rise of security games -- grounded in real-world scenarios like patrolling and infrastructure protection -- their practical evaluation has been hindered by limited access to the datasets used to generate them. In particular, although the structural components of these games (e.g., patrol paths derived from maps) can be replicated, the critical data defining target values -- central to utility modeling -- remain inaccessible. In this paper, we introduce a flexible framework that leverages open-access datasets to generate realistic matrix and security game instances. These include animal movement data for modeling anti-poaching scenarios and demographic and infrastructure data for infrastructure protection. Our framework allows users to customize utility functions and game parameters, while also offering a suite of preconfigured instances. We provide theoretical results highlighting the degeneracy and limitations of benchmarking on random games, and empirically compare our generated games against random baselines across a variety of standard algorithms for computing Nash and Stackelberg equilibria, including linear programming, incremental strategy generation, and self-play with no-regret learners.
