Value Function Approximation for Nonlinear MPC: Learning a Terminal Cost Function with a Descent Property
T. M. J. T. Baltussen, C. A. Orrico, A. Katriniok, W. P. M. H. Heemels, D. Krishnamoorthy
TL;DR
This work tackles reducing online computational burden in nonlinear MPC by learning a terminal cost function through supervised learning, while preserving stability via a descent property. It relaxes descent enforcement to a finite set of states and uses scenario optimization to obtain probabilistic guarantees that the learned terminal cost induces a stabilizing descent over most of the state space. The method supports nonconvex terminal costs and allows horizon shortening without sacrificing performance, demonstrated on a CSTR with expert demonstrations. The results show comparable closed-loop performance to long-horizon MPC with significant horizon reduction and computational savings, together with a probabilistic stability certificate and flexibility to adapt the OCP to changing constraints.
Abstract
We present a novel method to synthesize a terminal cost function for a nonlinear model predictive controller (MPC) through value function approximation using supervised learning. Existing methods enforce a descent property on the terminal cost function by construction, thereby restricting the class of terminal cost functions, which in turn can limit the performance and applicability of the MPC. We present a method to approximate the true cost-to-go with a general function approximator that is convex in its parameters, and impose the descent condition on a finite number of states. Through the scenario approach, we provide probabilistic guarantees on the descent condition of the terminal cost function over the continuous state space. We demonstrate and empirically verify our method in a numerical example. By learning a terminal cost function, the prediction horizon of the MPC can be significantly reduced, resulting in reduced online computational complexity while maintaining good closed-loop performance.
