Strategic Communication under Threat: Learning Information Trade-offs in Pursuit-Evasion Games
Valerio La Gatta, Dolev Mutzari, Sarit Kraus, VS Subrahmanian
TL;DR
This work tackles the problem of strategic information gathering under risk in adversarial settings by formulating the Pursuit-Evasion-Exposure-Concealment (PEEC) game. It introduces SHADOW, a multi-headed RL framework that jointly learns continuous navigation, discrete querying, and opponent modeling to balance information gain against exposure risk in non-holonomic, asymmetric dynamics. The key contributions include a formal CIAC measure for information acquisition, a flexible RL architecture that handles partial observability and memory, and extensive empirical evidence showing SHADOW outperforms diverse baselines and generalizes across varying risk levels and agent speeds. The findings have practical implications for safe autonomous decision-making in surveillance, search-and-rescue, and defense contexts where information disclosure carries real consequences. $PEEC$ and $CIAC$ formalism, along with SHADOW, enable robust, risk-aware communication policies that adapt to threat conditions and environmental asymmetries.
Abstract
Adversarial environments require agents to navigate a key strategic trade-off: acquiring information enhances situational awareness, but may simultaneously expose them to threats. To investigate this tension, we formulate a PursuitEvasion-Exposure-Concealment Game (PEEC) in which a pursuer agent must decide when to communicate in order to obtain the evader's position. Each communication reveals the pursuer's location, increasing the risk of being targeted. Both agents learn their movement policies via reinforcement learning, while the pursuer additionally learns a communication policy that balances observability and risk. We propose SHADOW (Strategic-communication Hybrid Action Decision-making under partial Observation for Warfare), a multi-headed sequential reinforcement learning framework that integrates continuous navigation control, discrete communication actions, and opponent modeling for behavior prediction. Empirical evaluations show that SHADOW pursuers achieve higher success rates than six competitive baselines. Our ablation study confirms that temporal sequence modeling and opponent modeling are critical for effective decision-making. Finally, our sensitivity analysis reveals that the learned policies generalize well across varying communication risks and physical asymmetries between agents.
