Table of Contents
Fetching ...

Tree-Based Stochastic Optimization for Solving Large-Scale Urban Network Security Games

Shuxin Zhuang, Linjian Meng, Shuxin Li, Minming Li, Youzhi Zhang

TL;DR

The paper tackles the intractability of computing Nash equilibria in large-scale urban network security games (UNSGs) caused by combinatorial action spaces. It introduces Tree-based Stochastic Optimization (TSO), which uses a tree-based action representation to sample non-enumerable actions and a sample-and-prune mechanism to prevent convergence to suboptimal local optima, with theoretical equivalence to the unbiased Nash Advantage Loss (NAL) framework. By deriving a tree-based Nash Advantage Loss (NAL) and proving gradient equivalence at stationary points, the method enables unbiased gradient-based optimization for NE in UNSGs and scales to graphs with millions of potential strategies. Empirical results show TSO outperforming PSRO and competing baselines across small, medium, and large-scale UNSGs, including asymmetric payoff and decentralized defender settings, while achieving faster training times; this demonstrates a practical, scalable approach to NE finding in complex security games and potential applicability to other large-scale normal-form games. $u_i^{ au}(oldsymbol{x}) = u_i(oldsymbol{x}) - au oldsymbol{x}_i^{ op} log oldsymbol{x}_i$ and related formulations underpin the entropy-regularized, low-variance optimization core of the framework.

Abstract

Urban Network Security Games (UNSGs), which model the strategic allocation of limited security resources on city road networks, are critical for urban safety. However, finding a Nash Equilibrium (NE) in large-scale UNSGs is challenging due to their massive and combinatorial action spaces. One common approach to addressing these games is the Policy-Space Response Oracle (PSRO) framework, which requires computing best responses (BR) at each iteration. However, precisely computing exact BRs is impractical in large-scale games, and employing reinforcement learning to approximate BRs inevitably introduces errors, which limits the overall effectiveness of the PSRO methods. Recent advancements in leveraging non-convex stochastic optimization to approximate an NE offer a promising alternative to the burdensome BR computation. However, utilizing existing stochastic optimization techniques with an unbiased loss function for UNSGs remains challenging because the action spaces are too vast to be effectively represented by neural networks. To address these issues, we introduce Tree-based Stochastic Optimization (TSO), a framework that bridges the gap between the stochastic optimization paradigm for NE-finding and the demands of UNSGs. Specifically, we employ the tree-based action representation that maps the whole action space onto a tree structure, addressing the challenge faced by neural networks in representing actions when the action space cannot be enumerated. We then incorporate this representation into the loss function and theoretically demonstrate its equivalence to the unbiased loss function. To further enhance the quality of the converged solution, we introduce a sample-and-prune mechanism that reduces the risk of being trapped in suboptimal local optima. Extensive experimental results indicate the superiority of TSO over other baseline algorithms in addressing the UNSGs.

Tree-Based Stochastic Optimization for Solving Large-Scale Urban Network Security Games

TL;DR

The paper tackles the intractability of computing Nash equilibria in large-scale urban network security games (UNSGs) caused by combinatorial action spaces. It introduces Tree-based Stochastic Optimization (TSO), which uses a tree-based action representation to sample non-enumerable actions and a sample-and-prune mechanism to prevent convergence to suboptimal local optima, with theoretical equivalence to the unbiased Nash Advantage Loss (NAL) framework. By deriving a tree-based Nash Advantage Loss (NAL) and proving gradient equivalence at stationary points, the method enables unbiased gradient-based optimization for NE in UNSGs and scales to graphs with millions of potential strategies. Empirical results show TSO outperforming PSRO and competing baselines across small, medium, and large-scale UNSGs, including asymmetric payoff and decentralized defender settings, while achieving faster training times; this demonstrates a practical, scalable approach to NE finding in complex security games and potential applicability to other large-scale normal-form games. and related formulations underpin the entropy-regularized, low-variance optimization core of the framework.

Abstract

Urban Network Security Games (UNSGs), which model the strategic allocation of limited security resources on city road networks, are critical for urban safety. However, finding a Nash Equilibrium (NE) in large-scale UNSGs is challenging due to their massive and combinatorial action spaces. One common approach to addressing these games is the Policy-Space Response Oracle (PSRO) framework, which requires computing best responses (BR) at each iteration. However, precisely computing exact BRs is impractical in large-scale games, and employing reinforcement learning to approximate BRs inevitably introduces errors, which limits the overall effectiveness of the PSRO methods. Recent advancements in leveraging non-convex stochastic optimization to approximate an NE offer a promising alternative to the burdensome BR computation. However, utilizing existing stochastic optimization techniques with an unbiased loss function for UNSGs remains challenging because the action spaces are too vast to be effectively represented by neural networks. To address these issues, we introduce Tree-based Stochastic Optimization (TSO), a framework that bridges the gap between the stochastic optimization paradigm for NE-finding and the demands of UNSGs. Specifically, we employ the tree-based action representation that maps the whole action space onto a tree structure, addressing the challenge faced by neural networks in representing actions when the action space cannot be enumerated. We then incorporate this representation into the loss function and theoretically demonstrate its equivalence to the unbiased loss function. To further enhance the quality of the converged solution, we introduce a sample-and-prune mechanism that reduces the risk of being trapped in suboptimal local optima. Extensive experimental results indicate the superiority of TSO over other baseline algorithms in addressing the UNSGs.

Paper Structure

This paper contains 59 sections, 4 theorems, 25 equations, 8 figures, 7 tables, 1 algorithm.

Key Result

Proposition 1

For any edge $j$ in $\mathcal{T}_{i}$, the first-order gradient of the tree-based NAL equals zero if and only if the first-order gradient of NAL is $\bm{0}$, i.e., where $\bm{g}_i = [g_{i,0},\, g_{i,1},\, \ldots,\, g_{i, |\mathcal{A}_i|-1}]^\top$. The detailed proof of this proposition is provided in Appendix appendix:equivalance_proof.

Figures (8)

  • Figure 1: An example illustrating the construction of the attacker and defender action representation trees. Taking this graph structure as an example, red vertices are possible starting points for the attacker, and the green vertex is the attacker's target. The defense team consists of two defenders, each of whom can deploy a resource on one of the three blue edges. The attacker action [4,2,1,3] corresponds to the path from $z_{\mathrm{root}}$ to $z_4$ in the attacker tree (solid lines). Similarly, the defender action [(3,4), (3,2)] corresponds to a path from $z_{\mathrm{root}}$ to $z_2$ in the defender tree.
  • Figure 2: Small-Scale Game Experiment Results
  • Figure 3: Medium-Scale Games Experiment Results
  • Figure 4: Diverse Games Experiment Results
  • Figure 5: Ablation Experiment Results
  • ...and 3 more figures

Theorems & Definitions (6)

  • Proposition 1
  • Proposition 2
  • Proposition
  • proof
  • Proposition
  • proof