Economic Model Predictive Control as a Solution to Markov Decision Processes
Dirk Reinhardt, Akhil S. Anand, Shambhuraj Sawant, Sebastien Gros
TL;DR
This work investigates how Economic Model Predictive Control (EMPC) can serve as a practical surrogate for solving Markov Decision Processes (MDPs) in stochastic settings. It formalizes the connection between EMPC and MDPs via the action-value function and derives conditions under which an MPC scheme can achieve closed-loop optimality, notably through a key result that adjusts the MPC cost with a delta term derived from the true MDP value function. The chapter extends dissipativity concepts to MDPs, discusses the implications of model choice (e.g., expected-value models) for optimality, and provides deterministic and stochastic-LQR cases where local optimality can be attained. It also presents illustrative examples to elucidate when EMPC succeeds or fails as an MDP solver and argues for a holistic approach where the EMPC model is designed to approximate the MDP solution rather than serve as a stand-alone model. Collectively, the work clarifies the practical conditions and caveats for using EMPC to obtain near- or locally optimal policies in stochastic environments, highlighting the importance of model selection, discounting, and problem structure for closed-loop performance.
Abstract
Markov Decision Processes (MDPs) offer a fairly generic and powerful framework to discuss the notion of optimal policies for dynamic systems, in particular when the dynamics are stochastic. However, computing the optimal policy of an MDP can be very difficult due to the curse of dimensionality present in solving the underlying Bellman equations. Model Predictive Control (MPC) is a very popular technique for building control policies for complex dynamic systems. Historically, MPC has focused on constraint satisfaction and steering dynamic systems towards a user-defined reference. More recently, Economic MPC was proposed as a computationally tractable way of building optimal policies for dynamic systems. When stochsaticity is present, economic MPC is close to the MDP framework. In that context, Economic MPC can be construed as attractable heuristic to provide approximate solutions to MDPs. However, there is arguably a knowledge gap in the literature regarding these approximate solutions and the conditions for an MPC scheme to achieve closed-loop optimality. This chapter aims to clarify this approximation pedagogically, to provide the conditions for MPC to deliver optimal policies, and to explore some of their consequences.
