Table of Contents
Fetching ...

Bayesian Ambiguity Contraction-based Adaptive Robust Markov Decision Processes for Adversarial Surveillance Missions

Jimin Choi, Max Z. Li

TL;DR

The paper tackles autonomous ISR planning under adversarial threat uncertainty by introducing a graph-based, two-phase adaptive robust MDP (RMDP) tailored to CCAs. It combines node-wise Bayesian belief updates with contraction of threat-ambiguity sets to progressively shift from conservative to efficient sensing and movement, while preserving safety via robust planning. Theoretical guarantees include almost-sure convergence of the time-varying robust operators and asymptotic optimality as credible threat sets contract to the true threat, along with probabilistic safety bounds and per-replan complexity analyses. Empirical results across Gaussian and non-Gaussian threat models and multiple network topologies show the adaptive planner achieves higher total rewards and fewer exposures than static robust and nominal baselines, validating the approach’s effectiveness and scalability.

Abstract

Collaborative Combat Aircraft (CCAs) are envisioned to enable autonomous Intelligence, Surveillance, and Reconnaissance (ISR) missions in contested environments, where adversaries may act strategically to deceive or evade detection. These missions pose challenges due to model uncertainty and the need for safe, real-time decision-making. Robust Markov Decision Processes (RMDPs) provide worst-case guarantees but are limited by static ambiguity sets that capture initial uncertainty without adapting to new observations. This paper presents an adaptive RMDP framework tailored to ISR missions with CCAs. We introduce a mission-specific formulation in which aircraft alternate between movement and sensing states. Adversarial tactics are modeled as a finite set of transition kernels, each capturing assumptions about how adversarial sensing or environmental conditions affect rewards. Our approach incrementally refines policies by eliminating inconsistent threat models, allowing agents to shift from conservative to aggressive behaviors while maintaining robustness. We provide theoretical guarantees showing that the adaptive planner converges as credible sets contract to the true threat and maintains safety under uncertainty. Experiments under Gaussian and non-Gaussian threat models across diverse network topologies show higher mission rewards and fewer exposure events compared to nominal and static robust planners.

Bayesian Ambiguity Contraction-based Adaptive Robust Markov Decision Processes for Adversarial Surveillance Missions

TL;DR

The paper tackles autonomous ISR planning under adversarial threat uncertainty by introducing a graph-based, two-phase adaptive robust MDP (RMDP) tailored to CCAs. It combines node-wise Bayesian belief updates with contraction of threat-ambiguity sets to progressively shift from conservative to efficient sensing and movement, while preserving safety via robust planning. Theoretical guarantees include almost-sure convergence of the time-varying robust operators and asymptotic optimality as credible threat sets contract to the true threat, along with probabilistic safety bounds and per-replan complexity analyses. Empirical results across Gaussian and non-Gaussian threat models and multiple network topologies show the adaptive planner achieves higher total rewards and fewer exposures than static robust and nominal baselines, validating the approach’s effectiveness and scalability.

Abstract

Collaborative Combat Aircraft (CCAs) are envisioned to enable autonomous Intelligence, Surveillance, and Reconnaissance (ISR) missions in contested environments, where adversaries may act strategically to deceive or evade detection. These missions pose challenges due to model uncertainty and the need for safe, real-time decision-making. Robust Markov Decision Processes (RMDPs) provide worst-case guarantees but are limited by static ambiguity sets that capture initial uncertainty without adapting to new observations. This paper presents an adaptive RMDP framework tailored to ISR missions with CCAs. We introduce a mission-specific formulation in which aircraft alternate between movement and sensing states. Adversarial tactics are modeled as a finite set of transition kernels, each capturing assumptions about how adversarial sensing or environmental conditions affect rewards. Our approach incrementally refines policies by eliminating inconsistent threat models, allowing agents to shift from conservative to aggressive behaviors while maintaining robustness. We provide theoretical guarantees showing that the adaptive planner converges as credible sets contract to the true threat and maintains safety under uncertainty. Experiments under Gaussian and non-Gaussian threat models across diverse network topologies show higher mission rewards and fewer exposure events compared to nominal and static robust planners.

Paper Structure

This paper contains 34 sections, 4 theorems, 44 equations, 15 figures, 6 tables, 3 algorithms.

Key Result

Theorem 1

Let $(\Omega,\mathcal{F},\mathbb P)$ be a probability space and $\{\mathcal{F}_t\}_{t\ge0}$ an increasing filtration.A filtration $\{\mathcal{F}_t\}$ is an increasing sequence of $\sigma$-algebras representing the information available up to time $t$. A random variable or operator is $\mathcal{F}_t$ where $\mathcal{T}_\infty(\omega)$ denotes the operator obtained as $t \to \infty$. Since each $\ma

Figures (15)

  • Figure 1: Mission-level concept for decentralized ISR missions with CCAs. Each aircraft conducts surveillance within an assigned region under local threat uncertainty. Colored polygons indicate example regional areas, red dashed zones mark uncertain threat locations, and aircraft icons with sensing arcs illustrate sensing coverage.
  • Figure 2: Information flow in the adaptive RMDP framework. Observations refine credible sets, which induce transition ambiguity sets used by the robust Bellman operator.
  • Figure 3: A zoomed-in view of a single ISR operation region from \ref{['fig:mission']}. The polygon shows one aircraft's assigned area, where each node represents a surveillance point with its own threat type, and edges define feasible movements. The aircraft performs sensing-based actions at nodes and movement actions along edges.
  • Figure 4: Comparison of reward distributions across different threat prototypes for Experiment 1. Each subplot shows the probability density of expected observation rewards for all available actions under a specific threat.
  • Figure 5: Exposure probability distributions corresponding to the threat prototypes in Experiment 1. The dashed vertical line indicates the decision threshold. Samples with exposure probability greater than 0.5 are considered exposed under the corresponding threat condition.
  • ...and 10 more figures

Theorems & Definitions (16)

  • Definition 1: Rectangular ambiguity sets
  • Definition 2: Prototype
  • Theorem 1: Almost sure convergence of the adaptive robust value iteration
  • proof
  • Remark 1: Interpretation and application of Theorem \ref{['thm:as_convergence']}
  • Corollary 1: Asymptotic optimality
  • Remark 2: Operational meaning of asymptotic optimality
  • Proposition 1: Probabilistic safety guarantee under $(s,a)$-rectangular uncertainty
  • proof
  • Remark 3: Safety guaranteed by the robust value
  • ...and 6 more