Table of Contents
Fetching ...

TAB-Fields: A Maximum Entropy Framework for Mission-Aware Adversarial Planning

Gokul Puthumanaillam, Jae Hyuk Song, Nurzhan Yesmagambet, Shinkyu Park, Melkior Ornik

TL;DR

The paper tackles autonomous adversarial planning when the adversary's exact policy is unknown but mission constraints are known. It introduces Task-Aware Behavior Fields (TAB-Fields), a maximum-entropy representation that encodes the distribution over adversary states subject to mission and environmental constraints, derived by solving a constrained KL-minimization with a reference process. TAB-Fields are integrated into planning through TAB-conditioned POMCP, where adversary transitions are sampled from TAB-Fields and beliefs are updated with observations. Empirical results in both hardware (ground robots) and simulation show TAB-POMCP outperforms baselines that assume fixed policies or ignore mission constraints, demonstrating scalability and improved decision-making in mission-constrained adversarial settings, with modest computational overhead.

Abstract

Autonomous agents operating in adversarial scenarios face a fundamental challenge: while they may know their adversaries' high-level objectives, such as reaching specific destinations within time constraints, the exact policies these adversaries will employ remain unknown. Traditional approaches address this challenge by treating the adversary's state as a partially observable element, leading to a formulation as a Partially Observable Markov Decision Process (POMDP). However, the induced belief-space dynamics in a POMDP require knowledge of the system's transition dynamics, which, in this case, depend on the adversary's unknown policy. Our key observation is that while an adversary's exact policy is unknown, their behavior is necessarily constrained by their mission objectives and the physical environment, allowing us to characterize the space of possible behaviors without assuming specific policies. In this paper, we develop Task-Aware Behavior Fields (TAB-Fields), a representation that captures adversary state distributions over time by computing the most unbiased probability distribution consistent with known constraints. We construct TAB-Fields by solving a constrained optimization problem that minimizes additional assumptions about adversary behavior beyond mission and environmental requirements. We integrate TAB-Fields with standard planning algorithms by introducing TAB-conditioned POMCP, an adaptation of Partially Observable Monte Carlo Planning. Through experiments in simulation with underwater robots and hardware implementations with ground robots, we demonstrate that our approach achieves superior performance compared to baselines that either assume specific adversary policies or neglect mission constraints altogether. Evaluation videos and code are available at https://tab-fields.github.io.

TAB-Fields: A Maximum Entropy Framework for Mission-Aware Adversarial Planning

TL;DR

The paper tackles autonomous adversarial planning when the adversary's exact policy is unknown but mission constraints are known. It introduces Task-Aware Behavior Fields (TAB-Fields), a maximum-entropy representation that encodes the distribution over adversary states subject to mission and environmental constraints, derived by solving a constrained KL-minimization with a reference process. TAB-Fields are integrated into planning through TAB-conditioned POMCP, where adversary transitions are sampled from TAB-Fields and beliefs are updated with observations. Empirical results in both hardware (ground robots) and simulation show TAB-POMCP outperforms baselines that assume fixed policies or ignore mission constraints, demonstrating scalability and improved decision-making in mission-constrained adversarial settings, with modest computational overhead.

Abstract

Autonomous agents operating in adversarial scenarios face a fundamental challenge: while they may know their adversaries' high-level objectives, such as reaching specific destinations within time constraints, the exact policies these adversaries will employ remain unknown. Traditional approaches address this challenge by treating the adversary's state as a partially observable element, leading to a formulation as a Partially Observable Markov Decision Process (POMDP). However, the induced belief-space dynamics in a POMDP require knowledge of the system's transition dynamics, which, in this case, depend on the adversary's unknown policy. Our key observation is that while an adversary's exact policy is unknown, their behavior is necessarily constrained by their mission objectives and the physical environment, allowing us to characterize the space of possible behaviors without assuming specific policies. In this paper, we develop Task-Aware Behavior Fields (TAB-Fields), a representation that captures adversary state distributions over time by computing the most unbiased probability distribution consistent with known constraints. We construct TAB-Fields by solving a constrained optimization problem that minimizes additional assumptions about adversary behavior beyond mission and environmental requirements. We integrate TAB-Fields with standard planning algorithms by introducing TAB-conditioned POMCP, an adaptation of Partially Observable Monte Carlo Planning. Through experiments in simulation with underwater robots and hardware implementations with ground robots, we demonstrate that our approach achieves superior performance compared to baselines that either assume specific adversary policies or neglect mission constraints altogether. Evaluation videos and code are available at https://tab-fields.github.io.

Paper Structure

This paper contains 10 sections, 3 equations, 6 figures, 1 table.

Figures (6)

  • Figure 1: Overview of the proposed approach applied to an interception task. The adversary's task is defined by mission objectives and environmental constraints (left). TAB-Fields are generated over time (top) to represent adversary state distributions and integrated into the planning process via TAB-conditioned POMCP (right). The resulting trajectories show the adversary's path (red line), the agent's response (green line), and the interception area ().
  • Figure 2: Example mission and its TAB-Field, where darker areas indicate higher probability of adversary presence. Red area denotes adversary start position and purple area indicates the goal checkpoint.
  • Figure 3: Comparison of agent (green) and adversary (red) trajectories followed by different approaches. Light red circles indicate full observability points at checkpoints, and marks the interception area. Adversary mission: Reach target [x,y] after visiting any three different checkpoints, taking no more than 10s between checkpoints, while avoiding the center of the environment.
  • Figure 4: Agent (green) and adversary (red) trajectories using TAB-POMCP. Teal bubbles indicate checkpoints. Adversary task: Reach corals after visiting checkpoints 1, 2, 3 in order, taking no more than 30s between checkpoints.
  • Figure : Table 2: Performance comparison between different methods on ATCR across different mission categories in an underwater setting. Mission types are the same as that in Table \ref{['tab:comparison1']} and are abbreviated as M1 through M5.
  • ...and 1 more figures