Deceptive Path Planning via Reinforcement Learning with Graph Neural Networks
Michael Y. Fatemi, Wesley A. Suttle, Brian M. Sadler
TL;DR
This work tackles deceptive path planning by reframing it as a graph-based, model-free RL problem that leverages local perception and Graph Neural Networks to generalize across unseen graphs and scale to larger environments. It introduces deception bonuses and a graph-centered state representation to translate classical deception objectives into an RL reward structure, enabling on-the-fly tunability via a time limit $T_{max}$ and real-time adaptivity to changing conditions. Empirical results on gridworlds and a continuous forest navigation problem show that trained policies achieve tunable deception without retraining, generalize to larger graphs, and adapt to dynamic decoy changes. The approach offers a scalable, transferable framework for DPP with potential applications in robotics and autonomous systems where observer inference plays a critical role.
Abstract
Deceptive path planning (DPP) is the problem of designing a path that hides its true goal from an outside observer. Existing methods for DPP rely on unrealistic assumptions, such as global state observability and perfect model knowledge, and are typically problem-specific, meaning that even minor changes to a previously solved problem can force expensive computation of an entirely new solution. Given these drawbacks, such methods do not generalize to unseen problem instances, lack scalability to realistic problem sizes, and preclude both on-the-fly tunability of deception levels and real-time adaptivity to changing environments. In this paper, we propose a reinforcement learning (RL)-based scheme for training policies to perform DPP over arbitrary weighted graphs that overcomes these issues. The core of our approach is the introduction of a local perception model for the agent, a new state space representation distilling the key components of the DPP problem, the use of graph neural network-based policies to facilitate generalization and scaling, and the introduction of new deception bonuses that translate the deception objectives of classical methods to the RL setting. Through extensive experimentation we show that, without additional fine-tuning, at test time the resulting policies successfully generalize, scale, enjoy tunable levels of deception, and adapt in real-time to changes in the environment.
