Bayesian Ambiguity Contraction-based Adaptive Robust Markov Decision Processes for Adversarial Surveillance Missions
Jimin Choi, Max Z. Li
TL;DR
The paper tackles autonomous ISR planning under adversarial threat uncertainty by introducing a graph-based, two-phase adaptive robust MDP (RMDP) tailored to CCAs. It combines node-wise Bayesian belief updates with contraction of threat-ambiguity sets to progressively shift from conservative to efficient sensing and movement, while preserving safety via robust planning. Theoretical guarantees include almost-sure convergence of the time-varying robust operators and asymptotic optimality as credible threat sets contract to the true threat, along with probabilistic safety bounds and per-replan complexity analyses. Empirical results across Gaussian and non-Gaussian threat models and multiple network topologies show the adaptive planner achieves higher total rewards and fewer exposures than static robust and nominal baselines, validating the approach’s effectiveness and scalability.
Abstract
Collaborative Combat Aircraft (CCAs) are envisioned to enable autonomous Intelligence, Surveillance, and Reconnaissance (ISR) missions in contested environments, where adversaries may act strategically to deceive or evade detection. These missions pose challenges due to model uncertainty and the need for safe, real-time decision-making. Robust Markov Decision Processes (RMDPs) provide worst-case guarantees but are limited by static ambiguity sets that capture initial uncertainty without adapting to new observations. This paper presents an adaptive RMDP framework tailored to ISR missions with CCAs. We introduce a mission-specific formulation in which aircraft alternate between movement and sensing states. Adversarial tactics are modeled as a finite set of transition kernels, each capturing assumptions about how adversarial sensing or environmental conditions affect rewards. Our approach incrementally refines policies by eliminating inconsistent threat models, allowing agents to shift from conservative to aggressive behaviors while maintaining robustness. We provide theoretical guarantees showing that the adaptive planner converges as credible sets contract to the true threat and maintains safety under uncertainty. Experiments under Gaussian and non-Gaussian threat models across diverse network topologies show higher mission rewards and fewer exposure events compared to nominal and static robust planners.
