Approximate solution of stochastic infinite horizon optimal control problems for constrained linear uncertain systems
Eunhyek Joa, Francesco Borrelli
TL;DR
This work introduces a data-driven MPC framework to approximate infinite-horizon, constrained linear stochastic optimal control with convex stage costs and bounded disturbances. By combining a one-step horizon MPC with episodic learning of a safe, robust terminal set and a convex, piecewise-affine value function, the method achieves recursive feasibility, robust constraint satisfaction, and convergence in probability to a target set, while guaranteeing near-optimality in a local region as episodes accrue. The approach replaces intractable infinite-horizon dynamics with a tractable disturbance-sampling approximation of the terminal cost and a forward-learning loop that expands the safe set and refines the value function, guided by a novel exploration strategy. Empirical results show the proposed method outperforms a certainty-equivalent LMPC in expected cost and is significantly faster than a global value-iteration method, highlighting its practicality for real-time, safety-critical applications.
Abstract
We propose a Model Predictive Control (MPC) with a single-step prediction horizon to approximate the solution of infinite horizon optimal control problems with the expected sum of convex stage costs for constrained linear uncertain systems. The proposed method aims to enhance a given sub-optimal controller, leveraging data to achieve a nearly optimal solution for the infinite horizon problem. The method is built on two techniques. First, we estimate the expected values of the convex costs using a computationally tractable approximation, achieved by sampling across the space of disturbances. Second, we implement a data-driven approach to approximate the optimal value function and its corresponding domain, through systematic exploration of the system's state space. These estimates are subsequently used to calculate the terminal cost and terminal set within the proposed MPC. We prove recursive feasibility, robust constraint satisfaction, and convergence in probability to the target set. Furthermore, we prove that the estimated value function converges to the optimal value function in a local region. The effectiveness of the proposed MPC is illustrated with detailed numerical simulations and comparisons with a value iteration method and a Learning MPC that minimizes a certainty equivalent cost.
