Table of Contents
Fetching ...

Strategic Communication under Threat: Learning Information Trade-offs in Pursuit-Evasion Games

Valerio La Gatta, Dolev Mutzari, Sarit Kraus, VS Subrahmanian

TL;DR

This work tackles the problem of strategic information gathering under risk in adversarial settings by formulating the Pursuit-Evasion-Exposure-Concealment (PEEC) game. It introduces SHADOW, a multi-headed RL framework that jointly learns continuous navigation, discrete querying, and opponent modeling to balance information gain against exposure risk in non-holonomic, asymmetric dynamics. The key contributions include a formal CIAC measure for information acquisition, a flexible RL architecture that handles partial observability and memory, and extensive empirical evidence showing SHADOW outperforms diverse baselines and generalizes across varying risk levels and agent speeds. The findings have practical implications for safe autonomous decision-making in surveillance, search-and-rescue, and defense contexts where information disclosure carries real consequences. $PEEC$ and $CIAC$ formalism, along with SHADOW, enable robust, risk-aware communication policies that adapt to threat conditions and environmental asymmetries.

Abstract

Adversarial environments require agents to navigate a key strategic trade-off: acquiring information enhances situational awareness, but may simultaneously expose them to threats. To investigate this tension, we formulate a PursuitEvasion-Exposure-Concealment Game (PEEC) in which a pursuer agent must decide when to communicate in order to obtain the evader's position. Each communication reveals the pursuer's location, increasing the risk of being targeted. Both agents learn their movement policies via reinforcement learning, while the pursuer additionally learns a communication policy that balances observability and risk. We propose SHADOW (Strategic-communication Hybrid Action Decision-making under partial Observation for Warfare), a multi-headed sequential reinforcement learning framework that integrates continuous navigation control, discrete communication actions, and opponent modeling for behavior prediction. Empirical evaluations show that SHADOW pursuers achieve higher success rates than six competitive baselines. Our ablation study confirms that temporal sequence modeling and opponent modeling are critical for effective decision-making. Finally, our sensitivity analysis reveals that the learned policies generalize well across varying communication risks and physical asymmetries between agents.

Strategic Communication under Threat: Learning Information Trade-offs in Pursuit-Evasion Games

TL;DR

This work tackles the problem of strategic information gathering under risk in adversarial settings by formulating the Pursuit-Evasion-Exposure-Concealment (PEEC) game. It introduces SHADOW, a multi-headed RL framework that jointly learns continuous navigation, discrete querying, and opponent modeling to balance information gain against exposure risk in non-holonomic, asymmetric dynamics. The key contributions include a formal CIAC measure for information acquisition, a flexible RL architecture that handles partial observability and memory, and extensive empirical evidence showing SHADOW outperforms diverse baselines and generalizes across varying risk levels and agent speeds. The findings have practical implications for safe autonomous decision-making in surveillance, search-and-rescue, and defense contexts where information disclosure carries real consequences. and formalism, along with SHADOW, enable robust, risk-aware communication policies that adapt to threat conditions and environmental asymmetries.

Abstract

Adversarial environments require agents to navigate a key strategic trade-off: acquiring information enhances situational awareness, but may simultaneously expose them to threats. To investigate this tension, we formulate a PursuitEvasion-Exposure-Concealment Game (PEEC) in which a pursuer agent must decide when to communicate in order to obtain the evader's position. Each communication reveals the pursuer's location, increasing the risk of being targeted. Both agents learn their movement policies via reinforcement learning, while the pursuer additionally learns a communication policy that balances observability and risk. We propose SHADOW (Strategic-communication Hybrid Action Decision-making under partial Observation for Warfare), a multi-headed sequential reinforcement learning framework that integrates continuous navigation control, discrete communication actions, and opponent modeling for behavior prediction. Empirical evaluations show that SHADOW pursuers achieve higher success rates than six competitive baselines. Our ablation study confirms that temporal sequence modeling and opponent modeling are critical for effective decision-making. Finally, our sensitivity analysis reveals that the learned policies generalize well across varying communication risks and physical asymmetries between agents.

Paper Structure

This paper contains 30 sections, 4 theorems, 17 equations, 11 figures, 2 tables.

Key Result

Proposition 1

With a zero-sum assumption (i.e., $P_e \equiv -P_p$) and $r_e=0$, $\alpha_c^{\mathsf{Q}} \ge 0$ is a maximum.

Figures (11)

  • Figure 1: SHADOW Pursuer: The Pursuer operates its navigation control $u_p$ and decides whether to query the opponent's state via a binary action $q_p \in \{0,1\}$. The environment returns the updated pursuer state $\mathbf{s}_p$ and, if $q_p = 1$, the evader’s current position $\mathbf{s}_e$. The Mediator determines the pursuer’s internal state representation $\tilde{\mathbf{s}}$, comprising: (i) the current position of the pursuer $\mathbf{s}_p$, (ii) the elapsed time since the last observation, (iii) the last observed position of the evader, and (iv) the estimated current position of the evader which is either returned by the environment ($\mathbf{s}_e$), if $q_p = 1$, or inferred by the Opponent Modeling module ($\mathbf{s}_e'$) if $q_p = 0$. The Mediator also provides feedback $\mathcal{L}$ to the Opponent Modeling module, indicating prediction error when the true position of the adversary becomes available ($q_p = 1$). Finally, the pursuer's internal state $\tilde{\mathbf{s}}$ and reward $r_p$ are passed to the Query Decision and Navigation Module to decide next actions. All networks include a Memory Unit (e.g., LSTM) responsible for encoding the temporal observation history.
  • Figure 2: Ablation Study: $\underline{\mathsf{CIAC}}$ under different Opponent modeling (a) and LSTM (b) configurations.
  • Figure 3: Sensitivity analysis: Effect of the shooting radius $r_{\mathsf{e}}$ (a) and the speed ratio $v_e / v_p$ (b) on pursuer win rate $P_\text{win}$.
  • Figure 4: Training dynamics: Outcome and communication metrics over training episodes. Shaded regions corresponding to distinct learning phases.
  • Figure 5: Examples of pursuer-evader interactions in our PEEC game: Each subplot illustrates a representative game, showing the trajectories of the pursuer and evader over the 2D map. The inset plot in each subplot reports the pursuer–evader distance as a function of the timestep.
  • ...and 6 more figures

Theorems & Definitions (8)

  • Definition 1: Critical Information Acquisition Cost ($\mathsf{CIAC}$)
  • Proposition 1
  • Definition 2: Base Information Acquisition Cost ($\underline{\mathsf{CIAC}}$)
  • Proposition 2
  • Proposition 2
  • proof
  • Proposition 2
  • proof