Infinite-Horizon Value Function Approximation for Model Predictive Control
Armand Jordana, Sébastien Kleff, Arthur Haffemayer, Joaquim Ortiz-Haro, Justin Carpentier, Nicolas Mansard, Ludovic Righetti
TL;DR
This work tackles the challenge of achieving global stability in constrained model predictive control (MPC) by approximating the infinite-horizon value function with neural networks. It introduces a non-discounted, hard-constrained framework where a neural network value function $V_{\theta}$ is learned via value iteration over a finite horizon $T$ and then used as a terminal cost in a $T$-step lookahead MPC, ensuring constraint satisfaction on a known control-invariant set $\Omega$. The training leverages a supervised VI objective with targets generated by solving the $T$-step OCP (via stagewise SQP) and includes a penalty to enforce $V_{\theta}(x^s)=0$ at stationary points, while deployment relies on online re-optimization to compensate for approximation errors. Experiments on toy problems and a real 7-DoF manipulator demonstrate reduced local minima and robust constraint handling, with inference times compatible with real-time control and performance gains scaling with horizon length. The work lays a path toward safe, online adaptation in robotics by uniting trajectory optimization with learned infinite-horizon value functions under hard constraints.
Abstract
Model Predictive Control has emerged as a popular tool for robots to generate complex motions. However, the real-time requirement has limited the use of hard constraints and large preview horizons, which are necessary to ensure safety and stability. In practice, practitioners have to carefully design cost functions that can imitate an infinite horizon formulation, which is tedious and often results in local minima. In this work, we study how to approximate the infinite horizon value function of constrained optimal control problems with neural networks using value iteration and trajectory optimization. Furthermore, we experimentally demonstrate how using this value function approximation as a terminal cost provides global stability to the model predictive controller. The approach is validated on two toy problems and a real-world scenario with online obstacle avoidance on an industrial manipulator where the value function is conditioned to the goal and obstacle.
