Table of Contents
Fetching ...

Occasionally Observed Piecewise-deterministic Markov Processes

Marissa Gee, Alexander Vladimirsky

TL;DR

This paper addresses optimal control of PDMPs when mode switches are unobserved and mode observations are infrequent. It (i) formulates PDE-based optimal-control problems (HJB and QVI) for finite, infinite, and indefinite horizons under various observation schemes, and (ii) introduces a reduced-belief approach that remains tractable when switching rates are constant, yielding linear-in-$M$ complexity. The framework is demonstrated on surveillance-evading path planning and Mars rover navigation, showing how infrequent observations and mode transitions shape trajectories and observation strategies. The results provide a practical method for planning under intermittent mode information with broad potential applications in security, robotics, and autonomous navigation, and point to extensions into Stackelberg security games and more general switching dynamics.

Abstract

Piecewise-deterministic Markov processes (PDMPs) are often used to model abrupt changes in the global environment or capabilities of a controlled system. This is typically done by considering a set of "operating modes" (each with its own system dynamics and performance metrics) and assuming that the mode can switch stochastically while the system state evolves. Such models have a broad range of applications in engineering, economics, manufacturing, robotics, and biological sciences. Here, we introduce and analyze an "occasionally observed" version of mode-switching PDMPs. We show how such systems can be controlled optimally if the planner is not alerted to mode-switches as they occur but may instead have access to infrequent mode observations. We first develop a general framework for handling this through dynamic programming on a higher-dimensional mode-belief space. While quite general, this method is rarely practical due to the curse of dimensionality. We then discuss assumptions that allow for solving the same problem much more efficiently, with the computational costs growing linearly (rather than exponentially) with the number of modes. We use this approach to derive Hamilton-Jacobi-Bellman PDEs and quasi-variational inequalities encoding the optimal behavior for a variety of planning horizons (fixed, infinite, indefinite, random) and mode-observation schemes (at fixed times or on-demand). We discuss the computational challenges associated with each version and illustrate the resulting methods on test problems from surveillance-evading path planning. We also include an example based on robotic navigation: a Mars rover that minimizes the expected time to target while accounting for the possibility of unobserved/incremental damages and dynamics-altering breakdowns.

Occasionally Observed Piecewise-deterministic Markov Processes

TL;DR

This paper addresses optimal control of PDMPs when mode switches are unobserved and mode observations are infrequent. It (i) formulates PDE-based optimal-control problems (HJB and QVI) for finite, infinite, and indefinite horizons under various observation schemes, and (ii) introduces a reduced-belief approach that remains tractable when switching rates are constant, yielding linear-in- complexity. The framework is demonstrated on surveillance-evading path planning and Mars rover navigation, showing how infrequent observations and mode transitions shape trajectories and observation strategies. The results provide a practical method for planning under intermittent mode information with broad potential applications in security, robotics, and autonomous navigation, and point to extensions into Stackelberg security games and more general switching dynamics.

Abstract

Piecewise-deterministic Markov processes (PDMPs) are often used to model abrupt changes in the global environment or capabilities of a controlled system. This is typically done by considering a set of "operating modes" (each with its own system dynamics and performance metrics) and assuming that the mode can switch stochastically while the system state evolves. Such models have a broad range of applications in engineering, economics, manufacturing, robotics, and biological sciences. Here, we introduce and analyze an "occasionally observed" version of mode-switching PDMPs. We show how such systems can be controlled optimally if the planner is not alerted to mode-switches as they occur but may instead have access to infrequent mode observations. We first develop a general framework for handling this through dynamic programming on a higher-dimensional mode-belief space. While quite general, this method is rarely practical due to the curse of dimensionality. We then discuss assumptions that allow for solving the same problem much more efficiently, with the computational costs growing linearly (rather than exponentially) with the number of modes. We use this approach to derive Hamilton-Jacobi-Bellman PDEs and quasi-variational inequalities encoding the optimal behavior for a variety of planning horizons (fixed, infinite, indefinite, random) and mode-observation schemes (at fixed times or on-demand). We discuss the computational challenges associated with each version and illustrate the resulting methods on test problems from surveillance-evading path planning. We also include an example based on robotic navigation: a Mars rover that minimizes the expected time to target while accounting for the possibility of unobserved/incremental damages and dynamics-altering breakdowns.
Paper Structure (25 sections, 2 theorems, 80 equations, 11 figures, 1 table, 4 algorithms)

This paper contains 25 sections, 2 theorems, 80 equations, 11 figures, 1 table, 4 algorithms.

Key Result

Proposition 1

Suppose Assumptions assump:regularity-a:autonomous_cost_and_dynamics hold and in addition the controlled dynamics is "geometric"; i.e., $\vb f(\mathbf{x}, \mathbf{a}) = f(\mathbf{x}, \mathbf{a}) \mathbf{a},$ where $\mathbf{a} \in A = \mathbb{S}^1$ is a unit vector specifying the chosen direction of hold for all $\mathbf{x} \in \Omega$, $\mathbf{a} \in \mathbb{S}^1$, $m\in\mathcal{M}.$ Let $z(\mat

Figures (11)

  • Figure 1: "Rotating Surveillance" environment. (a) Labeled contour plots of mode surveillance patterns $K_i(\mathbf{x})$. Mode labels (boxed numbers) are placed at the peak of each $K_i$. Only one pattern is "active" at a time. (b)$K_1(\mathbf{x})$, the surveillance pattern in Mode 1. (c)$\overline{K}_s(\mathbf{x})$, the expected surveillance associated with the stationary (uniform) mode distribution.
  • Figure 2: Optimal trajectories for a finite horizon process without (top row) and with (bottom row) mode observations. Time evolution is shown across columns, with magenta dash-dotted lines representing path components new to each column and black solid lines encoding path components shown previously. Cyan dots mark the planner's initial position and position at the end of each subinterval if an observation is not received. Yellow dots indicate the latest observations, with boxed numbers specifying the observed mode, yellow arrows indicate the direction of travel, and yellow stars indicate the planner's final position. The background is the expected surveillance at the end of each subinterval given $\mu(0) = 1$ and any other received mode observations.
  • Figure 3: Optimal trajectories for an infinite horizon process with periodic mode observations. Inter-observation period $T=1.$ Time discounting factor $\beta = 0.5.$ Same visual format as in Figure \ref{['fig:4modes-fin-traj-1']}. The solver required 19 iterations to converge with a tolerance of $10^{-6}.$
  • Figure 4: Optimal trajectories for infinite horizon process with periodic observations with three possible discount factors. Trajectories are shown for $t \in [0,4]$, corresponding to four periods. Cyan dots mark the planner's initial position. Yellow dots indicate observations and the observed modes are $\mu(1) = 2$, $\mu(2) = 3$, and $\mu(3) = 4$. Yellow arrows indicate the direction of travel. The background is $\overline{K}_s(\mathbf{x})$. As the discount rate $\beta$ increases, the future impacts the planner to a lesser degree, and the number of iterations needed to reach convergence decreases. When $\beta = 6$, the solver requires just two iterations (compared to the 19 above) to converge to within a tolerance of $10^{-6}.$
  • Figure 5: Surveillance patterns that form "barriers" along the direct path to the target (outlined in orange). Optimal trajectories are shown for $\lambda_{ij} = 0$ (no mode switches) and trajectory color encodes last observed mode (distribution). White represents Mode 1 ($\vb b(0) = \vb e_1$), black Mode 2 ($\vb b(0) = \vb e_2$), and gray the stationary distribution ($\vb b(0) = [1/2, 1/2]$). Cyan dots represent the starting location. Using the upper bound in Prop. \ref{['prop:bounded-TG']}, we solve the PDE over the time domain $[0, 14.83]$.
  • ...and 6 more figures

Theorems & Definitions (12)

  • Remark 1
  • Remark 2
  • Remark 3
  • Remark 4
  • Proposition 1
  • proof
  • Remark 5
  • Remark 6
  • Remark 7
  • Remark 8
  • ...and 2 more