Table of Contents
Fetching ...

Value Function Approximation for Nonlinear MPC: Learning a Terminal Cost Function with a Descent Property

T. M. J. T. Baltussen, C. A. Orrico, A. Katriniok, W. P. M. H. Heemels, D. Krishnamoorthy

TL;DR

This work tackles reducing online computational burden in nonlinear MPC by learning a terminal cost function through supervised learning, while preserving stability via a descent property. It relaxes descent enforcement to a finite set of states and uses scenario optimization to obtain probabilistic guarantees that the learned terminal cost induces a stabilizing descent over most of the state space. The method supports nonconvex terminal costs and allows horizon shortening without sacrificing performance, demonstrated on a CSTR with expert demonstrations. The results show comparable closed-loop performance to long-horizon MPC with significant horizon reduction and computational savings, together with a probabilistic stability certificate and flexibility to adapt the OCP to changing constraints.

Abstract

We present a novel method to synthesize a terminal cost function for a nonlinear model predictive controller (MPC) through value function approximation using supervised learning. Existing methods enforce a descent property on the terminal cost function by construction, thereby restricting the class of terminal cost functions, which in turn can limit the performance and applicability of the MPC. We present a method to approximate the true cost-to-go with a general function approximator that is convex in its parameters, and impose the descent condition on a finite number of states. Through the scenario approach, we provide probabilistic guarantees on the descent condition of the terminal cost function over the continuous state space. We demonstrate and empirically verify our method in a numerical example. By learning a terminal cost function, the prediction horizon of the MPC can be significantly reduced, resulting in reduced online computational complexity while maintaining good closed-loop performance.

Value Function Approximation for Nonlinear MPC: Learning a Terminal Cost Function with a Descent Property

TL;DR

This work tackles reducing online computational burden in nonlinear MPC by learning a terminal cost function through supervised learning, while preserving stability via a descent property. It relaxes descent enforcement to a finite set of states and uses scenario optimization to obtain probabilistic guarantees that the learned terminal cost induces a stabilizing descent over most of the state space. The method supports nonconvex terminal costs and allows horizon shortening without sacrificing performance, demonstrated on a CSTR with expert demonstrations. The results show comparable closed-loop performance to long-horizon MPC with significant horizon reduction and computational savings, together with a probabilistic stability certificate and flexibility to adapt the OCP to changing constraints.

Abstract

We present a novel method to synthesize a terminal cost function for a nonlinear model predictive controller (MPC) through value function approximation using supervised learning. Existing methods enforce a descent property on the terminal cost function by construction, thereby restricting the class of terminal cost functions, which in turn can limit the performance and applicability of the MPC. We present a method to approximate the true cost-to-go with a general function approximator that is convex in its parameters, and impose the descent condition on a finite number of states. Through the scenario approach, we provide probabilistic guarantees on the descent condition of the terminal cost function over the continuous state space. We demonstrate and empirically verify our method in a numerical example. By learning a terminal cost function, the prediction horizon of the MPC can be significantly reduced, resulting in reduced online computational complexity while maintaining good closed-loop performance.

Paper Structure

This paper contains 22 sections, 3 theorems, 24 equations, 4 figures, 2 tables.

Key Result

Lemma III.2

($\epsilon$-$\beta$ Result Campi_CVX_2008) Under the existence and uniqueness of the solution to $RP_M$: where $\epsilon$ denotes the violation parameter and $\beta$ denotes the confidence parameter.

Figures (4)

  • Figure 1: Position of the proposed method against related methods.
  • Figure 2: Phase plots of twelve closed-loop trajectories of (a) the expert $(T=50)$ MPC and proposed MPC $(N=1)$ from $S_M$ with $(M=2842)$ (b) without and (c) with the descent constraint \ref{['eq:Scenario_Descent']}. The color-graded (a) level sets depict the true cost-to-go $\mathcal{J}(x)$ and (b-c) the dots depict the learned cost-to-go $\mathcal{V}(x, \theta^*_M)$ evaluated at the training points. Red dots in (b) indicate the violation of the descent condition in the test data set. The green cross indicates the set point of the MPC. The nonconvexity of $\mathcal{V}$ can be observed in (c).
  • Figure 3: Phase plots of twelve closed-loop trajectories of the proposed MPC $(N=1)$ with the learned cost-to-go from $S_M$. The color-graded dots depict the learned cost-to-go $\mathcal{V}(x, \theta^*_M)$ evaluated at the training points. Red dots indicate the violation of the descent condition in the test data set. The green cross indicates the set point of the MPC.
  • Figure 4: Phase plot of the proposed MPC $(N=1)$ trained with $M = 2842$ training points and the expert MPC $(T=50)$. Soft-constraints are used for the adapted state constraint which is indicated by the dotted line.

Theorems & Definitions (7)

  • Definition II.3
  • Definition III.1
  • Lemma III.2
  • Proposition III.3
  • proof
  • Theorem IV.1
  • proof