Statistically Consistent Approximate Model Predictive Control
Elias Milios, Kim P. Wabersich, Felix Berkel, Felix Gruber, Melanie N. Zeilinger
TL;DR
The paper tackles the computational bottleneck of MPC and the limitations of imitation learning for set-valued MPC policies by proposing a two-stage IL approach that embeds MPC objectives through a stabilizing SCMPC value function into a differentiable loss $L_{ ext{MPC}}(x, \pi) = \ell(x, \pi) + V(f(x, \pi), \hat{\theta}_V)$. It proves statistical consistency, ensuring convergence of the learned policy to the MPC set as data grows, and ISS for approximate policies under finite data, providing practical safety guarantees. The method combines value-function approximation (Stage 1) with MPC-inspired policy learning (Stage 2), yielding explicit, fast policies with preserved constraint awareness. Numerical experiments illustrate superior safety and performance compared with standard behavioral cloning, including a scalar set-valued example and a 2D obstacle-avoidance task, highlighting the practical impact for real-time, certified-like MPC deployments.
Abstract
Model Predictive Control (MPC) offers rigorous safety and performance guarantees but is computationally intensive. Approximate MPC (AMPC) aims to circumvent this drawback by learning a computationally cheaper surrogate policy. Common approaches focus on imitation learning (IL) via behavioral cloning (BC), minimizing a mean-squared-error loss on a collection of state-input pairs. However, BC fundamentally fails to provide accurate approximations when MPC solutions are set-valued due to non-convex constraints or local minima. We propose a two-stage IL procedure to accurately approximate nonlinear, potentially set-valued MPC policies. The method integrates an approximation of the MPC's optimal value function into a one-step look-ahead loss function, and thereby embeds the MPC's constraint and performance objectives into the IL objective. This is achieved by adopting a stabilizing soft constrained MPC formulation, which reflects constraint violations in the optimal value function by combining a constraint tightening with slack penalties. We prove statistical consistency for policies that exactly minimize our IL objective, implying convergence to a safe and stabilizing control law, and establish input-to-state stability guarantees for approximate minimizers. Simulations demonstrate improved performance compared to BC.
