Bridging Impulse Control of Piecewise Deterministic Markov Processes and Markov Decision Processes: Frameworks, Extensions, and Open Challenges

Alice Cleynen; Benoîte de Saporta; Orlane Rossini; Régis Sabbadin; Amélie Vernay

Bridging Impulse Control of Piecewise Deterministic Markov Processes and Markov Decision Processes: Frameworks, Extensions, and Open Challenges

Alice Cleynen, Benoîte de Saporta, Orlane Rossini, Régis Sabbadin, Amélie Vernay

TL;DR

The article examines how Piecewise Deterministic Markov Processes (PDMPs) and Markov Decision Processes (MDPs) can be bridged to address impulse control and sequential decision problems in stochastic, hybrid systems. It develops a formal correspondence by embedding impulse control problems for PDMPs into MDPs (and extends to partial observation and Bayesian model uncertainty), while also showing how PDMPs can model continuous-time problems via tractable, time-augmented MDP constructs. A running medical case study anchors the theory, illustrating how impulse decisions, observations, and model learning interact across fully and partially observed settings. The work highlights open questions in scalability, exact vs. approximate solutions, and the potential of Monte Carlo and reinforcement learning methods to tackle the resulting high-dimensional, continuous-space problems, thereby advancing stochastic control, decision theory, statistics, and reinforcement learning at their intersection.

Abstract

Control theory plays a pivotal role in understanding and optimizing the behavior of complex dynamical systems across various scientific and engineering disciplines. Two key frameworks that have emerged for modeling and solving control problems in stochastic systems are piecewise deterministic Markov processes (PDMPs) and Markov decision processes (MDPs). Each framework has its unique strengths, and their intersection offers promising opportunities for tackling a broad class of problems, particularly in the context of impulse controls and decision-making in complex systems. The relationship between PDMPs and MDPs is a natural subject of exploration, as embedding impulse control problems for PDMPs into the MDP framework could open new avenues for their analysis and resolution. Specifically, this integration would allow leveraging the computational and theoretical tools developed for MDPs to address the challenges inherent in PDMPs. On the other hand, PDMPs can offer a versatile and simple paradigm to model continuous time problems that are often described as discrete-time MDPs parametrized by complex transition kernels. This transformation has the potential to bridge the gap between the two frameworks, enabling solutions to previously intractable problems and expanding the scope of both fields. This paper presents a comprehensive review of two research domains, illustrated through a recurring medical example. The example is revisited and progressively formalized within the framework of thevarious concepts and objects introduced

Bridging Impulse Control of Piecewise Deterministic Markov Processes and Markov Decision Processes: Frameworks, Extensions, and Open Challenges

TL;DR

Abstract

Paper Structure (57 sections, 3 theorems, 66 equations, 24 figures, 1 table, 9 algorithms)

This paper contains 57 sections, 3 theorems, 66 equations, 24 figures, 1 table, 9 algorithms.

Introduction
Piecewise Deterministic Markov Processes
PDMP, definition and examples
Generic definition
Running medical example as a PDMP
Semi-Markov dynamics
Running medical example as a time-augmented PDMP
Simulation
Embedded chains
PDMPs as versatile models
Main research domains involving PDMPs
Impulse control for PDMPs
Continuous versus impulse control
Definition of an impulse control problem
Running medical example as a controlled PDMP
...and 42 more sections

Key Result

Lemma 1

Let $S_1$ and $S_2$ be two independent random variables with respective (non-constant) intensity functions $\lambda_1$ and $\lambda_2$. Set $S=\min\{S_1,S_2\}$ and $I=\mathop{\mathrm{arg\,min}}\limits\{S_1,S_2\}$. Then the intensity of $S$ is $\lambda_1+\lambda_2$ and conditionally to $(S=t)$, the d

Figures (24)

Figure 1: Trajectory a generic PDMP. Starting from an initial value $x_0$ at time $t=0$, the process follows a deterministic trajectory until a jump occurs, either at a random time (as at $T_1$) or because the process reaches the state boundary (as at $T_2$). At jump times, the process jumps to a new location drawn from kernel $Q$.
Figure 2: Trajectory of the running medical example as a PDMP. The PDMP starts in mode $m=-1$ (patient under treatment) from an initial point $\mathtt{x}_0$ and follows a deterministic trajectory along its exponential flow $\Phi_{-1}$ until the first jump occurs at time $T_1$ when reaching the boundary $\mathtt{x}=\zeta_0$ (remission). The mode switches to $m=0$ and the flow is constant equal to $\zeta_0$ until a new jump occurs at time $T_2$ with an exponential clock. The mode switches then to $m=1$ (relapse of the disease) and the trajectory rises exponentially along the flow $\Phi_1$.
Figure 3: Sample trajectory of the telegraph model. The PDMP has 2 modes ($0$ and $1$), no Euclidean variable, and no boundary. It switches from one mode to the other with a constant intensity depending on the mode.
Figure 4: Sample trajectory of the stochastic SIR model. The PDMP has $(N+1)^2$ modes of the form $\mathsf{m}=(s,i)$ where $N$ is the total population size, $s$ the number of susceptible individuals, $i$ the number of infected individuals, and $r=N-s-i$ the number of removed individuals. The jump intensities are constant and depend on the mode. The only possible transitions correspond to the infection of a susceptible individual $\mathsf{m}'=(s',i')=(s-1,i+1)$ ($r$ is unchanged) or the removal of an infected individual $\mathsf{m}'=(s',i')=(s,i-1)$ ($r'=r+1$). There is no Euclidean variable and no boundary. The right-hand-side figure is a detailed section of the main graph on the left emphasizing piecewise constant trajectories.
Figure 5: Sample trajectory of the transmission control protocol (TCP) model. The PDMP has no mode, a 1-dimensional Euclidean variable with linear flow corresponding to the available window size of a communications network, and no boundary. The jump intensity increases with the window size, and jumps correspond to random congestion resulting in a punctual decrease of the window size. The model is detailed in chafai_long_2010.
...and 19 more figures

Theorems & Definitions (34)

Definition 1: PDMP
Definition 2: Time-augmented PDMP
Lemma 1
proof
Definition 3: Impulse strategy
Definition 4: Cost of an impulse strategy
Definition 5: Value function
Definition 6: $\epsilon$-optimal strategy
Theorem 1: Dynamic programming
Definition 7: MDP
...and 24 more

Bridging Impulse Control of Piecewise Deterministic Markov Processes and Markov Decision Processes: Frameworks, Extensions, and Open Challenges

TL;DR

Abstract

Bridging Impulse Control of Piecewise Deterministic Markov Processes and Markov Decision Processes: Frameworks, Extensions, and Open Challenges

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (24)

Theorems & Definitions (34)