Bridging Impulse Control of Piecewise Deterministic Markov Processes and Markov Decision Processes: Frameworks, Extensions, and Open Challenges
Alice Cleynen, Benoîte de Saporta, Orlane Rossini, Régis Sabbadin, Amélie Vernay
TL;DR
The article examines how Piecewise Deterministic Markov Processes (PDMPs) and Markov Decision Processes (MDPs) can be bridged to address impulse control and sequential decision problems in stochastic, hybrid systems. It develops a formal correspondence by embedding impulse control problems for PDMPs into MDPs (and extends to partial observation and Bayesian model uncertainty), while also showing how PDMPs can model continuous-time problems via tractable, time-augmented MDP constructs. A running medical case study anchors the theory, illustrating how impulse decisions, observations, and model learning interact across fully and partially observed settings. The work highlights open questions in scalability, exact vs. approximate solutions, and the potential of Monte Carlo and reinforcement learning methods to tackle the resulting high-dimensional, continuous-space problems, thereby advancing stochastic control, decision theory, statistics, and reinforcement learning at their intersection.
Abstract
Control theory plays a pivotal role in understanding and optimizing the behavior of complex dynamical systems across various scientific and engineering disciplines. Two key frameworks that have emerged for modeling and solving control problems in stochastic systems are piecewise deterministic Markov processes (PDMPs) and Markov decision processes (MDPs). Each framework has its unique strengths, and their intersection offers promising opportunities for tackling a broad class of problems, particularly in the context of impulse controls and decision-making in complex systems. The relationship between PDMPs and MDPs is a natural subject of exploration, as embedding impulse control problems for PDMPs into the MDP framework could open new avenues for their analysis and resolution. Specifically, this integration would allow leveraging the computational and theoretical tools developed for MDPs to address the challenges inherent in PDMPs. On the other hand, PDMPs can offer a versatile and simple paradigm to model continuous time problems that are often described as discrete-time MDPs parametrized by complex transition kernels. This transformation has the potential to bridge the gap between the two frameworks, enabling solutions to previously intractable problems and expanding the scope of both fields. This paper presents a comprehensive review of two research domains, illustrated through a recurring medical example. The example is revisited and progressively formalized within the framework of thevarious concepts and objects introduced
