Table of Contents
Fetching ...

Game-Theory-Assisted Reinforcement Learning for Border Defense: Early Termination based on Analytical Solutions

Goutam Das, Michael Dorothy, Kyle Volle, Daigo Shishika

Abstract

Game theory provides the gold standard for analyzing adversarial engagements, offering strong optimality guarantees. However, these guarantees often become brittle when assumptions such as perfect information are violated. Reinforcement learning (RL), by contrast, is adaptive but can be sample-inefficient in large, complex domains. This paper introduces a hybrid approach that leverages game-theoretic insights to improve RL training efficiency. We study a border defense game with limited perceptual range, where defender performance depends on both search and pursuit strategies, making classical differential game solutions inapplicable. Our method employs the Apollonius Circle (AC) to compute equilibrium in the post-detection phase, enabling early termination of RL episodes without learning pursuit dynamics. This allows RL to concentrate on learning search strategies while guaranteeing optimal continuation after detection. Across single- and multi-defender settings, this early termination method yields 10-20% higher rewards, faster convergence, and more efficient search trajectories. Extensive experiments validate these findings and demonstrate the overall effectiveness of our approach.

Game-Theory-Assisted Reinforcement Learning for Border Defense: Early Termination based on Analytical Solutions

Abstract

Game theory provides the gold standard for analyzing adversarial engagements, offering strong optimality guarantees. However, these guarantees often become brittle when assumptions such as perfect information are violated. Reinforcement learning (RL), by contrast, is adaptive but can be sample-inefficient in large, complex domains. This paper introduces a hybrid approach that leverages game-theoretic insights to improve RL training efficiency. We study a border defense game with limited perceptual range, where defender performance depends on both search and pursuit strategies, making classical differential game solutions inapplicable. Our method employs the Apollonius Circle (AC) to compute equilibrium in the post-detection phase, enabling early termination of RL episodes without learning pursuit dynamics. This allows RL to concentrate on learning search strategies while guaranteeing optimal continuation after detection. Across single- and multi-defender settings, this early termination method yields 10-20% higher rewards, faster convergence, and more efficient search trajectories. Extensive experiments validate these findings and demonstrate the overall effectiveness of our approach.
Paper Structure (28 sections, 4 theorems, 37 equations, 7 figures, 1 table, 1 algorithm)

This paper contains 28 sections, 4 theorems, 37 equations, 7 figures, 1 table, 1 algorithm.

Key Result

Lemma 1

The closed disk $\mathcal{D}(\mathbf{x}_A, \mathbf{x}_{D_i}, \nu_i) = \{\mathbf{p} : \|\mathbf{p} - \mathbf{c}_i\| \leq r_i\}$ is the attacker's dominance region—the set of points the attacker can reach no later than the defender.

Figures (7)

  • Figure 1: Illustration of the border-defense game environment. An attacker ($A$) spawns in the designated area at the top and attempts to reach the target line ($\mathcal{T}$) at the bottom. The defender team, composed of agents with varying sensing ($D_1$, $D_2$) and capture ($D_2$, $D_3$) capabilities, must coordinate to intercept the attacker.
  • Figure 2: Geometric representation of the optimal capture strategy in the deterministic pursuit phase (Phase II). The Apollonius Circle ($\mathcal{A}$) defines the boundary of the attacker's dominance region. The optimal capture point is the location on the circle with the minimum y-coordinate, and the optimal trajectories for both agents converge at this point.
  • Figure 3: Optimal capture point computation for a multi-defender scenario. The attacker's reachable region is the intersection of the individual Apollonius Circles ($\mathcal{A}_1, \mathcal{A}_2$). The optimal interception point, marked by the red star, corresponds to the minimum y-coordinate within this intersection, representing the game-theoretic payoff.
  • Figure 4: Value function level-set for optimal defender placement. The contours illustrate the Nash equilibrium payoff ($J^*$) as a function of a third defender's position, given two fixed defenders. Blue regions indicate configurations that yield a better payoff for the attacker, while red regions are more favorable for the defenders.
  • Figure 5: Training performance in the 1v1 homogeneous scenario. The GT-assisted policy (blue) converges faster and to a significantly higher mean reward ($\mu=0.605$) than the end-to-end baseline (red, $\mu=0.545$), demonstrating a 10% performance improvement.
  • ...and 2 more figures

Theorems & Definitions (12)

  • Definition 1: Phase I - Stochastic Search
  • Definition 2: Phase II - Deterministic Pursuit
  • Remark 1: Search-Pursuit Trade-off
  • Lemma 1: Dominance Region Isaacs1965dorothy2024one
  • Theorem 1: Nash Equilibrium Payoff
  • proof
  • Remark 2: Effect of Nonzero Capture Radius
  • Definition 3: Multi-Defender Dominance Region
  • Proposition 1: Multi-Defender Nash Equilibrium
  • Remark 3: Computational Complexity
  • ...and 2 more