Table of Contents
Fetching ...

BP-MPC: Optimizing the Closed-Loop Performance of MPC using BackPropagation

Riccardo Zuliani, Efe C. Balta, John Lygeros

TL;DR

The paper addresses tuning MPC policies to maximize closed-loop performance for nonlinear systems with constraints. It introduces BP-MPC, a backpropagation framework that differentiates through the MPC policy by leveraging conservative Jacobians to handle nonsmooth sensitivities and by using linearized dynamics to preserve convex subproblems. Key contributions include a dual-QP based differentiation of the MPC, a modular backpropagation scheme for the whole horizon with convergence guarantees to a critical point, extensions to state-dependent costs/constraints and infeasibility recovery, and demonstration on nonlinear simulation. The approach yields a practical, convergent method to improve MPC performance and provides a path toward robust, differentiable MPC tuning.

Abstract

Model predictive control (MPC) is pervasive in research and industry. However, designing the cost function and the constraints of the MPC to maximize closed-loop performance remains an open problem. To achieve optimal tuning, we propose a backpropagation scheme that solves a policy optimization problem with nonlinear system dynamics and MPC policies. We enforce the system dynamics using linearization and allow the MPC problem to contain elements that depend on the current system state and on past MPC solutions. Moreover, we propose a simple extension that can deal with losses of feasibility. Our approach, unlike other methods in the literature, enjoys convergence guarantees.

BP-MPC: Optimizing the Closed-Loop Performance of MPC using BackPropagation

TL;DR

The paper addresses tuning MPC policies to maximize closed-loop performance for nonlinear systems with constraints. It introduces BP-MPC, a backpropagation framework that differentiates through the MPC policy by leveraging conservative Jacobians to handle nonsmooth sensitivities and by using linearized dynamics to preserve convex subproblems. Key contributions include a dual-QP based differentiation of the MPC, a modular backpropagation scheme for the whole horizon with convergence guarantees to a critical point, extensions to state-dependent costs/constraints and infeasibility recovery, and demonstration on nonlinear simulation. The approach yields a practical, convergent method to improve MPC performance and provides a path toward robust, differentiable MPC tuning.

Abstract

Model predictive control (MPC) is pervasive in research and industry. However, designing the cost function and the constraints of the MPC to maximize closed-loop performance remains an open problem. To achieve optimal tuning, we propose a backpropagation scheme that solves a policy optimization problem with nonlinear system dynamics and MPC policies. We enforce the system dynamics using linearization and allow the MPC problem to contain elements that depend on the current system state and on past MPC solutions. Moreover, we propose a simple extension that can deal with losses of feasibility. Our approach, unlike other methods in the literature, enjoys convergence guarantees.
Paper Structure (25 sections, 12 theorems, 68 equations, 7 figures, 1 table, 5 algorithms)

This paper contains 25 sections, 12 theorems, 68 equations, 7 figures, 1 table, 5 algorithms.

Key Result

Lemma 1

Given two path-differentiable functions $\varphi:\mathbb{R}^{n_x}\to\mathbb{R}^{n_p}$, $\psi:\mathbb{R}^{n_p}\to\mathbb{R}^{n_u}$, with conservative Jacobians $\mathcal{J}_{\varphi}$ and $\mathcal{J}_{\psi}$, the function $\varphi\circ\psi$ is path-differentiable with conservative Jacobian $\mathcal

Figures (7)

  • Figure 1: Suboptimality between closed-loop cost and optimal cost.
  • Figure 2: Comparison of closed-loop state and input trajectories.
  • Figure 3: Comparison of the relative suboptimality and the worst-case computation times (dashed lines) of a nonlinear MPC with different horizon lengths, and our scheme with fixed horizon $N=11$ (solid lines).
  • Figure 4: Comparison of the relative suboptimality and the worst-case computation times of a nonlinear MPC with terminal cost and different horizon lengths (dashed lines), and our scheme with fixed horizon $N=11$ (solid lines).
  • Figure 5: Closed-loop median performance (over $1000$ random noise samples) of nonlinear trajectory optimization (feedforward) and BP-MPC (receding horizon) with different noise magnitudes.
  • ...and 2 more figures

Theorems & Definitions (24)

  • Definition 1: bolte2021nonsmooth
  • Lemma 1: bolte2021conservative
  • Definition 2: coste1999introduction
  • Lemma 2: bolte2021conservative
  • Lemma 3: bolte2021nonsmooth_extended
  • Lemma 4: davis2020stochastic
  • Lemma 5
  • proof
  • Theorem 1
  • proof
  • ...and 14 more