Learning the cost-to-go for mixed-integer nonlinear model predictive control
Christopher A. Orrico, W. P. M. H. Heemels, Dinesh Krishnamoorthy
TL;DR
The paper addresses the real-time solvability challenge of mixed-integer nonlinear MPC (MINMPC) for hybrid systems by replacing the long-horizon value function with an offline-learned convex cost-to-go $V(x)= x^{\mathsf{T}} P x$, enabling online optimization with a one-step horizon. The key idea is to learn $P$ via inverse optimization from offline expert trajectories, formulated as a semidefinite program that enforces $P\succeq 0$ and near-KKT consistency, so the online problem minimizes $\ell(x(t),u,z) + V(f(x(t),u,z))$ subject to $g(x(t),u,z)\le 0$. The approach is demonstrated on a Lotka-Volterra fishing problem with discrete control, showing that the IOC-imputed cost-to-go yields near the same performance as the full-horizon MINMPC but with enormous online speedups (e.g., 217 s vs 54.1 ms per decision). This enables real-time application of MINMPC to complex hybrid systems, while the work also outlines directions to broaden the cost-to-go beyond quadratic forms and to establish stability and feasibility guarantees.
Abstract
Application of nonlinear model predictive control (NMPC) to problems with hybrid dynamical systems, disjoint constraints, or discrete controls often results in mixed-integer formulations with both continuous and discrete decision variables. However, solving mixed-integer nonlinear programming problems (MINLP) in real-time is challenging, which can be a limiting factor in many applications. To address the computational complexity of solving mixed integer nonlinear model predictive control problem in real-time, this paper proposes an approximate mixed integer NMPC formulation based on value function approximation. Leveraging Bellman's principle of optimality, the key idea here is to divide the prediction horizon into two parts, where the optimal value function of the latter part of the prediction horizon is approximated offline using expert demonstrations. Doing so allows us to solve the MINMPC problem with a considerably shorter prediction horizon online, thereby reducing the online computation cost. The paper uses an inverted pendulum example with discrete controls to illustrate this approach.
