Table of Contents
Fetching ...

Value of Information-based Deceptive Path Planning Under Adversarial Interventions

Wesley A. Suttle, Jesse Milzman, Mustafa O. Karabag, Brian M. Sadler, Ufuk Topcu

TL;DR

This paper addresses deceptive path planning under adversarial interventions, where an observer can modify the environment to impede the agent. It introduces a novel Markov decision process (MDP) model for DPP in adversarial settings and two value-of-information (VoI) deception objectives that quantify deception via the observer’s beliefs about the agent’s true goal. The authors derive tractable, linear-programming-based solutions that compute globally optimal policies, and demonstrate that VoI DPP yields flexible deception and lower post-intervention costs compared to passive-observer approaches and conservative planning on illustrative gridworlds. The work significantly advances DPP by coupling deception with observer interventions, enabling controllable trade-offs between deception and path efficiency with practical computational methods.

Abstract

Existing methods for deceptive path planning (DPP) address the problem of designing paths that conceal their true goal from a passive, external observer. Such methods do not apply to problems where the observer has the ability to perform adversarial interventions to impede the path planning agent. In this paper, we propose a novel Markov decision process (MDP)-based model for the DPP problem under adversarial interventions and develop new value of information (VoI) objectives to guide the design of DPP policies. Using the VoI objectives we propose, path planning agents deceive the adversarial observer into choosing suboptimal interventions by selecting trajectories that are of low informational value to the observer. Leveraging connections to the linear programming theory for MDPs, we derive computationally efficient solution methods for synthesizing policies for performing DPP under adversarial interventions. In our experiments, we illustrate the effectiveness of the proposed solution method in achieving deceptiveness under adversarial interventions and demonstrate the superior performance of our approach to both existing DPP methods and conservative path planning approaches on illustrative gridworld problems.

Value of Information-based Deceptive Path Planning Under Adversarial Interventions

TL;DR

This paper addresses deceptive path planning under adversarial interventions, where an observer can modify the environment to impede the agent. It introduces a novel Markov decision process (MDP) model for DPP in adversarial settings and two value-of-information (VoI) deception objectives that quantify deception via the observer’s beliefs about the agent’s true goal. The authors derive tractable, linear-programming-based solutions that compute globally optimal policies, and demonstrate that VoI DPP yields flexible deception and lower post-intervention costs compared to passive-observer approaches and conservative planning on illustrative gridworlds. The work significantly advances DPP by coupling deception with observer interventions, enabling controllable trade-offs between deception and path efficiency with practical computational methods.

Abstract

Existing methods for deceptive path planning (DPP) address the problem of designing paths that conceal their true goal from a passive, external observer. Such methods do not apply to problems where the observer has the ability to perform adversarial interventions to impede the path planning agent. In this paper, we propose a novel Markov decision process (MDP)-based model for the DPP problem under adversarial interventions and develop new value of information (VoI) objectives to guide the design of DPP policies. Using the VoI objectives we propose, path planning agents deceive the adversarial observer into choosing suboptimal interventions by selecting trajectories that are of low informational value to the observer. Leveraging connections to the linear programming theory for MDPs, we derive computationally efficient solution methods for synthesizing policies for performing DPP under adversarial interventions. In our experiments, we illustrate the effectiveness of the proposed solution method in achieving deceptiveness under adversarial interventions and demonstrate the superior performance of our approach to both existing DPP methods and conservative path planning approaches on illustrative gridworld problems.

Paper Structure

This paper contains 21 sections, 18 equations, 4 figures.

Figures (4)

  • Figure 1: VoI DPP enjoys flexible deceptiveness in adversarial setting. VoI deception generates a variety of deceptive paths as $\gamma_a$ varies, while baseline methods fail by providing either shortest (Exaggeration, Ambiguity for $\gamma_a > 0.5$) or overly conservative CPP (Conservative) paths. For small values of $\gamma_a$, VoI Exaggeration and VoI Ambiguity approximately recover the Conservative baseline.
  • Figure 2: Existing methods generate inappropriate trajectories in adversarial setting. VoI DPP methods recognize only the top two goals are relevant, since the observer cannot affect cost of reaching remaining goals. Exaggeration (all $\gamma_a$ values) and Ambiguity ($\gamma_a = 0.5$) are distracted by irrelevant candidate goals in the lower half and generate wasteful, purposeless paths. For smaller $\gamma_a$, VoI methods approximately recover CPP paths.
  • Figure 3: Performance comparison under adversarial interventions on Fig. \ref{['fig:rooms10']} gridworld. Agents deceive up until intervention time, then follow shortest paths afterwards. Interventions are selected according to observer belief at time of intervention. Plots show ratio of total path length to the shortest path from the start state to the true goal: a value of $1.0$ corresponds to a shortest path and minimum possible value, while larger values capture the excess cost being incurred for behaving deceptively (for DPP) or conservatively (for CPP). Total costs are aggregated over 10 values of $\gamma_a$, box and whiskers show mean and standard deviation. The critical deception window is from timesteps 1 to 5, during which VoI-based deception is particularly effective, outperforming pass-observer methods and outperforming or remaining competitive with CPP.
  • Figure 4: Performance comparison under adversarial interventions on Fig. \ref{['fig:rooms19']} gridworld. Agents deceive up until intervention time, then follow shortest paths afterwards. Interventions are selected according to observer belief at time of intervention. Plots show ratio of total path length to the shortest path from the start state to the true goal. Total costs are aggregated over 10 values of $\gamma_a$, box and whiskers show mean and standard deviation. The critical deception window in this problem is from timesteps 19 to 30. Throughout this time VoI deception significantly outperforms classical exaggeration methods and either outperforms or remains competitive with CPP. VoI approaches tend to outperform classical ambiguity through timestep 26, but are subsequently outperformed by it.