Table of Contents
Fetching ...

Economic Model Predictive Control as a Solution to Markov Decision Processes

Dirk Reinhardt, Akhil S. Anand, Shambhuraj Sawant, Sebastien Gros

TL;DR

This work investigates how Economic Model Predictive Control (EMPC) can serve as a practical surrogate for solving Markov Decision Processes (MDPs) in stochastic settings. It formalizes the connection between EMPC and MDPs via the action-value function and derives conditions under which an MPC scheme can achieve closed-loop optimality, notably through a key result that adjusts the MPC cost with a delta term derived from the true MDP value function. The chapter extends dissipativity concepts to MDPs, discusses the implications of model choice (e.g., expected-value models) for optimality, and provides deterministic and stochastic-LQR cases where local optimality can be attained. It also presents illustrative examples to elucidate when EMPC succeeds or fails as an MDP solver and argues for a holistic approach where the EMPC model is designed to approximate the MDP solution rather than serve as a stand-alone model. Collectively, the work clarifies the practical conditions and caveats for using EMPC to obtain near- or locally optimal policies in stochastic environments, highlighting the importance of model selection, discounting, and problem structure for closed-loop performance.

Abstract

Markov Decision Processes (MDPs) offer a fairly generic and powerful framework to discuss the notion of optimal policies for dynamic systems, in particular when the dynamics are stochastic. However, computing the optimal policy of an MDP can be very difficult due to the curse of dimensionality present in solving the underlying Bellman equations. Model Predictive Control (MPC) is a very popular technique for building control policies for complex dynamic systems. Historically, MPC has focused on constraint satisfaction and steering dynamic systems towards a user-defined reference. More recently, Economic MPC was proposed as a computationally tractable way of building optimal policies for dynamic systems. When stochsaticity is present, economic MPC is close to the MDP framework. In that context, Economic MPC can be construed as attractable heuristic to provide approximate solutions to MDPs. However, there is arguably a knowledge gap in the literature regarding these approximate solutions and the conditions for an MPC scheme to achieve closed-loop optimality. This chapter aims to clarify this approximation pedagogically, to provide the conditions for MPC to deliver optimal policies, and to explore some of their consequences.

Economic Model Predictive Control as a Solution to Markov Decision Processes

TL;DR

This work investigates how Economic Model Predictive Control (EMPC) can serve as a practical surrogate for solving Markov Decision Processes (MDPs) in stochastic settings. It formalizes the connection between EMPC and MDPs via the action-value function and derives conditions under which an MPC scheme can achieve closed-loop optimality, notably through a key result that adjusts the MPC cost with a delta term derived from the true MDP value function. The chapter extends dissipativity concepts to MDPs, discusses the implications of model choice (e.g., expected-value models) for optimality, and provides deterministic and stochastic-LQR cases where local optimality can be attained. It also presents illustrative examples to elucidate when EMPC succeeds or fails as an MDP solver and argues for a holistic approach where the EMPC model is designed to approximate the MDP solution rather than serve as a stand-alone model. Collectively, the work clarifies the practical conditions and caveats for using EMPC to obtain near- or locally optimal policies in stochastic environments, highlighting the importance of model selection, discounting, and problem structure for closed-loop performance.

Abstract

Markov Decision Processes (MDPs) offer a fairly generic and powerful framework to discuss the notion of optimal policies for dynamic systems, in particular when the dynamics are stochastic. However, computing the optimal policy of an MDP can be very difficult due to the curse of dimensionality present in solving the underlying Bellman equations. Model Predictive Control (MPC) is a very popular technique for building control policies for complex dynamic systems. Historically, MPC has focused on constraint satisfaction and steering dynamic systems towards a user-defined reference. More recently, Economic MPC was proposed as a computationally tractable way of building optimal policies for dynamic systems. When stochsaticity is present, economic MPC is close to the MDP framework. In that context, Economic MPC can be construed as attractable heuristic to provide approximate solutions to MDPs. However, there is arguably a knowledge gap in the literature regarding these approximate solutions and the conditions for an MPC scheme to achieve closed-loop optimality. This chapter aims to clarify this approximation pedagogically, to provide the conditions for MPC to deliver optimal policies, and to explore some of their consequences.
Paper Structure (29 sections, 2 theorems, 72 equations, 9 figures)

This paper contains 29 sections, 2 theorems, 72 equations, 9 figures.

Key Result

Theorem 1

If the MPC model $\boldsymbol{\mathrm{f}}$ satisfies the equality: and $T = V^{\star}$ is used in MPC eq:MPC:Qmodel then eq:MPC:PerfectModel:PlusConstant holds.

Figures (9)

  • Figure 1: Different pathways present in a stochastic process for reaching a state $\boldsymbol{\mathrm{s}}_k$ (for $k=10$) starting from initial state $\boldsymbol{\mathrm{s}}_0$.
  • Figure 2: Markov chains for a policy $\boldsymbol{\mathrm{\pi}}$ starting from an initial state $\boldsymbol{\mathrm{s}}_0$ along with regions for which the modified stage cost $\ell$ is bounded. (Constraints violations resulting in infinite cost are highlighted in red blobs)
  • Figure 3: Optimal policies for a) the cumulative cost criterion \ref{['eq:OptPolicy']} (black), b) the gain optimality criterion \ref{['eq:Gain:Policy']} (red), and c) the bias-optimality criterion \ref{['eq:Bias:Policy']} (blue)
  • Figure 4: Illustration of the difference between state transition models that conform to either the expected value, as defined in \ref{['eq:E:Fitting']}, or the maximum likelihood, as given in \ref{['eq:ML:Fitting']}, of the MDP's state transition probability density function $\rho$.
  • Figure 5: Illustration of the energy storage example in \ref{['sec:BatExample']}. The optimal value function and policy of the MDP, $V^\star, \pi^\star$ (black) are approximated by the MPC with $V^\mathrm{MPC},\pi^\mathrm{MPC}$ (red). Despite close to linear value functions, the policies differ significantly due to the activation of the lower bound.
  • ...and 4 more figures

Theorems & Definitions (2)

  • Theorem 1
  • Lemma 1