Table of Contents
Fetching ...

Deceptive Path Planning via Reinforcement Learning with Graph Neural Networks

Michael Y. Fatemi, Wesley A. Suttle, Brian M. Sadler

TL;DR

This work tackles deceptive path planning by reframing it as a graph-based, model-free RL problem that leverages local perception and Graph Neural Networks to generalize across unseen graphs and scale to larger environments. It introduces deception bonuses and a graph-centered state representation to translate classical deception objectives into an RL reward structure, enabling on-the-fly tunability via a time limit $T_{max}$ and real-time adaptivity to changing conditions. Empirical results on gridworlds and a continuous forest navigation problem show that trained policies achieve tunable deception without retraining, generalize to larger graphs, and adapt to dynamic decoy changes. The approach offers a scalable, transferable framework for DPP with potential applications in robotics and autonomous systems where observer inference plays a critical role.

Abstract

Deceptive path planning (DPP) is the problem of designing a path that hides its true goal from an outside observer. Existing methods for DPP rely on unrealistic assumptions, such as global state observability and perfect model knowledge, and are typically problem-specific, meaning that even minor changes to a previously solved problem can force expensive computation of an entirely new solution. Given these drawbacks, such methods do not generalize to unseen problem instances, lack scalability to realistic problem sizes, and preclude both on-the-fly tunability of deception levels and real-time adaptivity to changing environments. In this paper, we propose a reinforcement learning (RL)-based scheme for training policies to perform DPP over arbitrary weighted graphs that overcomes these issues. The core of our approach is the introduction of a local perception model for the agent, a new state space representation distilling the key components of the DPP problem, the use of graph neural network-based policies to facilitate generalization and scaling, and the introduction of new deception bonuses that translate the deception objectives of classical methods to the RL setting. Through extensive experimentation we show that, without additional fine-tuning, at test time the resulting policies successfully generalize, scale, enjoy tunable levels of deception, and adapt in real-time to changes in the environment.

Deceptive Path Planning via Reinforcement Learning with Graph Neural Networks

TL;DR

This work tackles deceptive path planning by reframing it as a graph-based, model-free RL problem that leverages local perception and Graph Neural Networks to generalize across unseen graphs and scale to larger environments. It introduces deception bonuses and a graph-centered state representation to translate classical deception objectives into an RL reward structure, enabling on-the-fly tunability via a time limit and real-time adaptivity to changing conditions. Empirical results on gridworlds and a continuous forest navigation problem show that trained policies achieve tunable deception without retraining, generalize to larger graphs, and adapt to dynamic decoy changes. The approach offers a scalable, transferable framework for DPP with potential applications in robotics and autonomous systems where observer inference plays a critical role.

Abstract

Deceptive path planning (DPP) is the problem of designing a path that hides its true goal from an outside observer. Existing methods for DPP rely on unrealistic assumptions, such as global state observability and perfect model knowledge, and are typically problem-specific, meaning that even minor changes to a previously solved problem can force expensive computation of an entirely new solution. Given these drawbacks, such methods do not generalize to unseen problem instances, lack scalability to realistic problem sizes, and preclude both on-the-fly tunability of deception levels and real-time adaptivity to changing environments. In this paper, we propose a reinforcement learning (RL)-based scheme for training policies to perform DPP over arbitrary weighted graphs that overcomes these issues. The core of our approach is the introduction of a local perception model for the agent, a new state space representation distilling the key components of the DPP problem, the use of graph neural network-based policies to facilitate generalization and scaling, and the introduction of new deception bonuses that translate the deception objectives of classical methods to the RL setting. Through extensive experimentation we show that, without additional fine-tuning, at test time the resulting policies successfully generalize, scale, enjoy tunable levels of deception, and adapt in real-time to changes in the environment.
Paper Structure (28 sections, 9 equations, 14 figures)

This paper contains 28 sections, 9 equations, 14 figures.

Figures (14)

  • Figure 1: After training on only six small gridworld problems, our GNN-equipped RL agent is able to perform tunably deceptive navigation through a never-before-seen, continuous forest environment using only local perception. Deceptiveness is achieved through exaggeration towards a decoy goal and is tuned by allowing the agent 15, 20, 25, and 30 additional steps of "time-to-deceive" before reaching the goal.
  • Figure 2: Comparison of classical ambiguity from savas2022deceptive with our \ref{['eqn:our_ambiguity']}. Colors denote: start position, true goal, decoy goal.
  • Figure 3: The set of grid worlds used in training. We considered three $8\times8$ and three $16\times16$ topologies and found that this was sufficient for generalization.
  • Figure 4: Learning curves for various $k$, the number of GNN layers and the radius of the agent's $k$-hop neighborhood. Curves present mean and 95% confidence intervals over five independent replications. We found that for ambiguity, one layer is not enough to effectively act deceptively, while performance peaks on validation data at two layers before dropping off again for four and eight layers. For exaggeration-tuned behavior, the optimal $k$ is four, where one layer is again not enough to act deceptively, two has a slight improvement in performance, and four has the best performance.
  • Figure 5: GNN architecture comparison. Curves present mean and 95% confidence intervals over five independent replications. For exaggeration-tuned behavior, we found a trade-off between deceptiveness and path efficiency; this is to be expected, as extra bias in the path towards a decoy goal adds extra distance compared to the baseline shortest path. For ambiguity, we found that GraphSAGE and graph isomorphism networks were able to reach the goal under the time limit reasonably frequently, while also maximizing the level of ambiguity in the generated path. The graph attention network performed similarly for ambiguity, but exhibited an inability balance acting deceptively with reaching the goal.
  • ...and 9 more figures