Table of Contents
Fetching ...

Infinite-Horizon Value Function Approximation for Model Predictive Control

Armand Jordana, Sébastien Kleff, Arthur Haffemayer, Joaquim Ortiz-Haro, Justin Carpentier, Nicolas Mansard, Ludovic Righetti

TL;DR

This work tackles the challenge of achieving global stability in constrained model predictive control (MPC) by approximating the infinite-horizon value function with neural networks. It introduces a non-discounted, hard-constrained framework where a neural network value function $V_{\theta}$ is learned via value iteration over a finite horizon $T$ and then used as a terminal cost in a $T$-step lookahead MPC, ensuring constraint satisfaction on a known control-invariant set $\Omega$. The training leverages a supervised VI objective with targets generated by solving the $T$-step OCP (via stagewise SQP) and includes a penalty to enforce $V_{\theta}(x^s)=0$ at stationary points, while deployment relies on online re-optimization to compensate for approximation errors. Experiments on toy problems and a real 7-DoF manipulator demonstrate reduced local minima and robust constraint handling, with inference times compatible with real-time control and performance gains scaling with horizon length. The work lays a path toward safe, online adaptation in robotics by uniting trajectory optimization with learned infinite-horizon value functions under hard constraints.

Abstract

Model Predictive Control has emerged as a popular tool for robots to generate complex motions. However, the real-time requirement has limited the use of hard constraints and large preview horizons, which are necessary to ensure safety and stability. In practice, practitioners have to carefully design cost functions that can imitate an infinite horizon formulation, which is tedious and often results in local minima. In this work, we study how to approximate the infinite horizon value function of constrained optimal control problems with neural networks using value iteration and trajectory optimization. Furthermore, we experimentally demonstrate how using this value function approximation as a terminal cost provides global stability to the model predictive controller. The approach is validated on two toy problems and a real-world scenario with online obstacle avoidance on an industrial manipulator where the value function is conditioned to the goal and obstacle.

Infinite-Horizon Value Function Approximation for Model Predictive Control

TL;DR

This work tackles the challenge of achieving global stability in constrained model predictive control (MPC) by approximating the infinite-horizon value function with neural networks. It introduces a non-discounted, hard-constrained framework where a neural network value function is learned via value iteration over a finite horizon and then used as a terminal cost in a -step lookahead MPC, ensuring constraint satisfaction on a known control-invariant set . The training leverages a supervised VI objective with targets generated by solving the -step OCP (via stagewise SQP) and includes a penalty to enforce at stationary points, while deployment relies on online re-optimization to compensate for approximation errors. Experiments on toy problems and a real 7-DoF manipulator demonstrate reduced local minima and robust constraint handling, with inference times compatible with real-time control and performance gains scaling with horizon length. The work lays a path toward safe, online adaptation in robotics by uniting trajectory optimization with learned infinite-horizon value functions under hard constraints.

Abstract

Model Predictive Control has emerged as a popular tool for robots to generate complex motions. However, the real-time requirement has limited the use of hard constraints and large preview horizons, which are necessary to ensure safety and stability. In practice, practitioners have to carefully design cost functions that can imitate an infinite horizon formulation, which is tedious and often results in local minima. In this work, we study how to approximate the infinite horizon value function of constrained optimal control problems with neural networks using value iteration and trajectory optimization. Furthermore, we experimentally demonstrate how using this value function approximation as a terminal cost provides global stability to the model predictive controller. The approach is validated on two toy problems and a real-world scenario with online obstacle avoidance on an industrial manipulator where the value function is conditioned to the goal and obstacle.

Paper Structure

This paper contains 18 sections, 16 equations, 6 figures, 1 algorithm.

Figures (6)

  • Figure 1: Rollout of MPC controllers with different horizon lengths using the learned value function as a terminal cost.
  • Figure 2: MPC trajectories for different initial conditions. The red dot represents the target $x^{\star}$.
  • Figure 3: Error between the ground truth and the learned value function during training for various horizon length. The larger the horizon is, the faster the algorithm converges.
  • Figure 4: We run $1000$ MPC simulations starting from random initial states with increasing horizon for each controller. Horizon $0$ corresponds to the policy.
  • Figure 5: Snapshots of pick-and-place task with static obstacle avoidance for the default MPC without value function (bottom) and the proposed MPC with value function (top). The green dots represent the end-effector targets that must be reached alternatively while avoiding collision with the black rod placed in the center (highlighted in red).
  • ...and 1 more figures