Occasionally Observed Piecewise-deterministic Markov Processes
Marissa Gee, Alexander Vladimirsky
TL;DR
This paper addresses optimal control of PDMPs when mode switches are unobserved and mode observations are infrequent. It (i) formulates PDE-based optimal-control problems (HJB and QVI) for finite, infinite, and indefinite horizons under various observation schemes, and (ii) introduces a reduced-belief approach that remains tractable when switching rates are constant, yielding linear-in-$M$ complexity. The framework is demonstrated on surveillance-evading path planning and Mars rover navigation, showing how infrequent observations and mode transitions shape trajectories and observation strategies. The results provide a practical method for planning under intermittent mode information with broad potential applications in security, robotics, and autonomous navigation, and point to extensions into Stackelberg security games and more general switching dynamics.
Abstract
Piecewise-deterministic Markov processes (PDMPs) are often used to model abrupt changes in the global environment or capabilities of a controlled system. This is typically done by considering a set of "operating modes" (each with its own system dynamics and performance metrics) and assuming that the mode can switch stochastically while the system state evolves. Such models have a broad range of applications in engineering, economics, manufacturing, robotics, and biological sciences. Here, we introduce and analyze an "occasionally observed" version of mode-switching PDMPs. We show how such systems can be controlled optimally if the planner is not alerted to mode-switches as they occur but may instead have access to infrequent mode observations. We first develop a general framework for handling this through dynamic programming on a higher-dimensional mode-belief space. While quite general, this method is rarely practical due to the curse of dimensionality. We then discuss assumptions that allow for solving the same problem much more efficiently, with the computational costs growing linearly (rather than exponentially) with the number of modes. We use this approach to derive Hamilton-Jacobi-Bellman PDEs and quasi-variational inequalities encoding the optimal behavior for a variety of planning horizons (fixed, infinite, indefinite, random) and mode-observation schemes (at fixed times or on-demand). We discuss the computational challenges associated with each version and illustrate the resulting methods on test problems from surveillance-evading path planning. We also include an example based on robotic navigation: a Mars rover that minimizes the expected time to target while accounting for the possibility of unobserved/incremental damages and dynamics-altering breakdowns.
