Table of Contents
Fetching ...

Approximate solution of stochastic infinite horizon optimal control problems for constrained linear uncertain systems

Eunhyek Joa, Francesco Borrelli

TL;DR

This work introduces a data-driven MPC framework to approximate infinite-horizon, constrained linear stochastic optimal control with convex stage costs and bounded disturbances. By combining a one-step horizon MPC with episodic learning of a safe, robust terminal set and a convex, piecewise-affine value function, the method achieves recursive feasibility, robust constraint satisfaction, and convergence in probability to a target set, while guaranteeing near-optimality in a local region as episodes accrue. The approach replaces intractable infinite-horizon dynamics with a tractable disturbance-sampling approximation of the terminal cost and a forward-learning loop that expands the safe set and refines the value function, guided by a novel exploration strategy. Empirical results show the proposed method outperforms a certainty-equivalent LMPC in expected cost and is significantly faster than a global value-iteration method, highlighting its practicality for real-time, safety-critical applications.

Abstract

We propose a Model Predictive Control (MPC) with a single-step prediction horizon to approximate the solution of infinite horizon optimal control problems with the expected sum of convex stage costs for constrained linear uncertain systems. The proposed method aims to enhance a given sub-optimal controller, leveraging data to achieve a nearly optimal solution for the infinite horizon problem. The method is built on two techniques. First, we estimate the expected values of the convex costs using a computationally tractable approximation, achieved by sampling across the space of disturbances. Second, we implement a data-driven approach to approximate the optimal value function and its corresponding domain, through systematic exploration of the system's state space. These estimates are subsequently used to calculate the terminal cost and terminal set within the proposed MPC. We prove recursive feasibility, robust constraint satisfaction, and convergence in probability to the target set. Furthermore, we prove that the estimated value function converges to the optimal value function in a local region. The effectiveness of the proposed MPC is illustrated with detailed numerical simulations and comparisons with a value iteration method and a Learning MPC that minimizes a certainty equivalent cost.

Approximate solution of stochastic infinite horizon optimal control problems for constrained linear uncertain systems

TL;DR

This work introduces a data-driven MPC framework to approximate infinite-horizon, constrained linear stochastic optimal control with convex stage costs and bounded disturbances. By combining a one-step horizon MPC with episodic learning of a safe, robust terminal set and a convex, piecewise-affine value function, the method achieves recursive feasibility, robust constraint satisfaction, and convergence in probability to a target set, while guaranteeing near-optimality in a local region as episodes accrue. The approach replaces intractable infinite-horizon dynamics with a tractable disturbance-sampling approximation of the terminal cost and a forward-learning loop that expands the safe set and refines the value function, guided by a novel exploration strategy. Empirical results show the proposed method outperforms a certainty-equivalent LMPC in expected cost and is significantly faster than a global value-iteration method, highlighting its practicality for real-time, safety-critical applications.

Abstract

We propose a Model Predictive Control (MPC) with a single-step prediction horizon to approximate the solution of infinite horizon optimal control problems with the expected sum of convex stage costs for constrained linear uncertain systems. The proposed method aims to enhance a given sub-optimal controller, leveraging data to achieve a nearly optimal solution for the infinite horizon problem. The method is built on two techniques. First, we estimate the expected values of the convex costs using a computationally tractable approximation, achieved by sampling across the space of disturbances. Second, we implement a data-driven approach to approximate the optimal value function and its corresponding domain, through systematic exploration of the system's state space. These estimates are subsequently used to calculate the terminal cost and terminal set within the proposed MPC. We prove recursive feasibility, robust constraint satisfaction, and convergence in probability to the target set. Furthermore, we prove that the estimated value function converges to the optimal value function in a local region. The effectiveness of the proposed MPC is illustrated with detailed numerical simulations and comparisons with a value iteration method and a Learning MPC that minimizes a certainty equivalent cost.
Paper Structure (38 sections, 16 theorems, 102 equations, 6 figures, 1 table)

This paper contains 38 sections, 16 theorems, 102 equations, 6 figures, 1 table.

Key Result

Proposition 1

Suppose the value function $Q^{j}(\cdot)$ is a convex function. Let Assumptions assum: bounded noise-assum: Stage cost hold. Then, the value function $Q^{j}(\cdot)$ satisfies the following inequality for all $x \in \mathcal{SS}^{j}$:

Figures (6)

  • Figure 1: Overall block diagram of the proposed approach. Online, we solve a tractable form of the MPC \ref{['eq:MPC']}, which will be presented in \ref{['eq:MPC discrete']}, and apply the optimal input to the system \ref{['eq:system']}. Offline, we store closed-loop states and the associated inputs and update the terminal set $\mathcal{SS}^{j}$ and the terminal cost $Q^{j}(\cdot)$, which will be presented in Sec. \ref{['sec: learning terminal set and cost']}.
  • Figure 2: A two-dimensional example of the discretization method. The dark gray polytope is the disturbance set $\mathcal{W}$. All dots are the discretized disturbances. The blue dots denote randomly sampled points while the red dots denote vertices of the disturbance set $\mathcal{W}$.
  • Figure 3: A two-dimensional example of the distance measure \ref{['eq: distance measure definition']}. The light blue polytope is the nominal safe set $\mathcal{SS}^{j-1} \ominus \mathcal{W} = \{x | H x \leq h\}$ in \ref{['subeq: terminal constraints reform MPC']}. The red line represents the active constraint.
  • Figure 4: sample mean of the realized cost for episode $j$
  • Figure 5: Comparison: (a) Value function at $x_S$ calculated from the value iteration method and (b) Terminal cost $Q^j(x_S)$ of the proposed MPC.
  • ...and 1 more figures

Theorems & Definitions (48)

  • Remark 1
  • Remark 2
  • Remark 3
  • Remark 4
  • Proposition 1
  • proof
  • Remark 5
  • Remark 6
  • Remark 7
  • Proposition 2
  • ...and 38 more