Grasper: A Generalist Pursuer for Pursuit-Evasion Problems
Pengdeng Li, Shuxin Li, Xinrun Wang, Jakub Cerny, Youzhi Zhang, Stephen McAleer, Hau Chan, Bo An
TL;DR
Grasper tackles the challenge of solving pursuit-evasion problems with varying initial conditions by conditioning pursuer policies on PEG-specific graphs. It combines a graph neural encoder with a hypernetwork to generate a PEG-conditioned base policy, and employs a three-stage training pipeline—GraphMAE-based pre-pretraining, heuristic-guided multi-task pre-training, and PSRO-based fine-tuning—to achieve fast, generalizable BR policy learning. Across synthetic and real-world maps, Grasper outperforms strong baselines in both solution quality and generalizability, while also stabilizing training through effective pre-training components. This approach enables practical deployment of pursuer policies that adapt to diverse and dynamic urban PEG scenarios. It represents a substantive advance in generalizing PEG solutions beyond fixed initial conditions, with potential impact on real-time security resource allocation and automated pursuit planning.
Abstract
Pursuit-evasion games (PEGs) model interactions between a team of pursuers and an evader in graph-based environments such as urban street networks. Recent advancements have demonstrated the effectiveness of the pre-training and fine-tuning paradigm in PSRO to improve scalability in solving large-scale PEGs. However, these methods primarily focus on specific PEGs with fixed initial conditions that may vary substantially in real-world scenarios, which significantly hinders the applicability of the traditional methods. To address this issue, we introduce Grasper, a GeneRAlist purSuer for Pursuit-Evasion pRoblems, capable of efficiently generating pursuer policies tailored to specific PEGs. Our contributions are threefold: First, we present a novel architecture that offers high-quality solutions for diverse PEGs, comprising critical components such as (i) a graph neural network (GNN) to encode PEGs into hidden vectors, and (ii) a hypernetwork to generate pursuer policies based on these hidden vectors. As a second contribution, we develop an efficient three-stage training method involving (i) a pre-pretraining stage for learning robust PEG representations through self-supervised graph learning techniques like GraphMAE, (ii) a pre-training stage utilizing heuristic-guided multi-task pre-training (HMP) where heuristic-derived reference policies (e.g., through Dijkstra's algorithm) regularize pursuer policies, and (iii) a fine-tuning stage that employs PSRO to generate pursuer policies on designated PEGs. Finally, we perform extensive experiments on synthetic and real-world maps, showcasing Grasper's significant superiority over baselines in terms of solution quality and generalizability. We demonstrate that Grasper provides a versatile approach for solving pursuit-evasion problems across a broad range of scenarios, enabling practical deployment in real-world situations.
