Game-Theory-Assisted Reinforcement Learning for Border Defense: Early Termination based on Analytical Solutions

Goutam Das; Michael Dorothy; Kyle Volle; Daigo Shishika

Game-Theory-Assisted Reinforcement Learning for Border Defense: Early Termination based on Analytical Solutions

Goutam Das, Michael Dorothy, Kyle Volle, Daigo Shishika

Abstract

Game theory provides the gold standard for analyzing adversarial engagements, offering strong optimality guarantees. However, these guarantees often become brittle when assumptions such as perfect information are violated. Reinforcement learning (RL), by contrast, is adaptive but can be sample-inefficient in large, complex domains. This paper introduces a hybrid approach that leverages game-theoretic insights to improve RL training efficiency. We study a border defense game with limited perceptual range, where defender performance depends on both search and pursuit strategies, making classical differential game solutions inapplicable. Our method employs the Apollonius Circle (AC) to compute equilibrium in the post-detection phase, enabling early termination of RL episodes without learning pursuit dynamics. This allows RL to concentrate on learning search strategies while guaranteeing optimal continuation after detection. Across single- and multi-defender settings, this early termination method yields 10-20% higher rewards, faster convergence, and more efficient search trajectories. Extensive experiments validate these findings and demonstrate the overall effectiveness of our approach.

Game-Theory-Assisted Reinforcement Learning for Border Defense: Early Termination based on Analytical Solutions

Abstract

Paper Structure (28 sections, 4 theorems, 37 equations, 7 figures, 1 table, 1 algorithm)

This paper contains 28 sections, 4 theorems, 37 equations, 7 figures, 1 table, 1 algorithm.

Introduction
Problem Formulation
Game Theoretic Analysis
Phase Decomposition and Strategic Coupling
Analytical Solution for the Pursuit Phase
Single Defender Case
Multiple Defender Case
GT-Assisted Reward Mechanism
Early Termination Principle
Undiscounted Payoff Equivalence
Computational and Learning Advantages
Impact on Learned Behaviors
MULTI-AGENT REINFORCEMENT LEARNING APPROACH
Decentralized Partially Observable MDP Formulation
Observation and Action Spaces
...and 13 more sections

Key Result

Lemma 1

The closed disk $\mathcal{D}(\mathbf{x}_A, \mathbf{x}_{D_i}, \nu_i) = \{\mathbf{p} : \|\mathbf{p} - \mathbf{c}_i\| \leq r_i\}$ is the attacker's dominance region—the set of points the attacker can reach no later than the defender.

Figures (7)

Figure 1: Illustration of the border-defense game environment. An attacker ($A$) spawns in the designated area at the top and attempts to reach the target line ($\mathcal{T}$) at the bottom. The defender team, composed of agents with varying sensing ($D_1$, $D_2$) and capture ($D_2$, $D_3$) capabilities, must coordinate to intercept the attacker.
Figure 2: Geometric representation of the optimal capture strategy in the deterministic pursuit phase (Phase II). The Apollonius Circle ($\mathcal{A}$) defines the boundary of the attacker's dominance region. The optimal capture point is the location on the circle with the minimum y-coordinate, and the optimal trajectories for both agents converge at this point.
Figure 3: Optimal capture point computation for a multi-defender scenario. The attacker's reachable region is the intersection of the individual Apollonius Circles ($\mathcal{A}_1, \mathcal{A}_2$). The optimal interception point, marked by the red star, corresponds to the minimum y-coordinate within this intersection, representing the game-theoretic payoff.
Figure 4: Value function level-set for optimal defender placement. The contours illustrate the Nash equilibrium payoff ($J^*$) as a function of a third defender's position, given two fixed defenders. Blue regions indicate configurations that yield a better payoff for the attacker, while red regions are more favorable for the defenders.
Figure 5: Training performance in the 1v1 homogeneous scenario. The GT-assisted policy (blue) converges faster and to a significantly higher mean reward ($\mu=0.605$) than the end-to-end baseline (red, $\mu=0.545$), demonstrating a 10% performance improvement.
...and 2 more figures

Theorems & Definitions (12)

Definition 1: Phase I - Stochastic Search
Definition 2: Phase II - Deterministic Pursuit
Remark 1: Search-Pursuit Trade-off
Lemma 1: Dominance Region Isaacs1965dorothy2024one
Theorem 1: Nash Equilibrium Payoff
proof
Remark 2: Effect of Nonzero Capture Radius
Definition 3: Multi-Defender Dominance Region
Proposition 1: Multi-Defender Nash Equilibrium
Remark 3: Computational Complexity
...and 2 more

Game-Theory-Assisted Reinforcement Learning for Border Defense: Early Termination based on Analytical Solutions

Abstract

Game-Theory-Assisted Reinforcement Learning for Border Defense: Early Termination based on Analytical Solutions

Authors

Abstract

Table of Contents

Key Result

Figures (7)

Theorems & Definitions (12)