DiffTune-MPC: Closed-Loop Learning for Model Predictive Control

Ran Tao; Sheng Cheng; Xiaofeng Wang; Shenlong Wang; Naira Hovakimyan

DiffTune-MPC: Closed-Loop Learning for Model Predictive Control

Ran Tao, Sheng Cheng, Xiaofeng Wang, Shenlong Wang, Naira Hovakimyan

TL;DR

DiffTune-MPC addresses the challenge of tuning MPC cost parameters by learning in a closed-loop setting, allowing the evaluation horizon $N$ to exceed the planner horizon $T$. The method derives analytical gradients for the MPC first action through a KKT-based auxiliary optimization, covering linear MPC with linear inequalities and nonlinear MPC via SQP, and then updates the cost parameters with a projected gradient method. Empirical results in simulation and a high-fidelity RotorPy quadrotor simulator show faster convergence and better generalization than baselines, demonstrating the practicality of gradient-based, closed-loop auto-tuning for MPC. This work broadens differentiable programming for control by enabling cost-function learning for both linear and nonlinear MPC under constraints, with promising implications for robust, constraint-aware robotic control.

Abstract

Model predictive control (MPC) has been applied to many platforms in robotics and autonomous systems for its capability to predict a system's future behavior while incorporating constraints that a system may have. To enhance the performance of a system with an MPC controller, one can manually tune the MPC's cost function. However, it can be challenging due to the possibly high dimension of the parameter space as well as the potential difference between the open-loop cost function in MPC and the overall closed-loop performance metric function. This paper presents DiffTune-MPC, a novel learning method, to learn the cost function of an MPC in a closed-loop manner. The proposed framework is compatible with the scenario where the time interval for performance evaluation and MPC's planning horizon have different lengths. We show the auxiliary problem whose solution admits the analytical gradients of MPC and discuss its variations in different MPC settings, including nonlinear MPCs that are solved using sequential quadratic programming. Simulation results demonstrate the learning capability of DiffTune-MPC and the generalization capability of the learned MPC parameters.

DiffTune-MPC: Closed-Loop Learning for Model Predictive Control

TL;DR

DiffTune-MPC addresses the challenge of tuning MPC cost parameters by learning in a closed-loop setting, allowing the evaluation horizon

to exceed the planner horizon

. The method derives analytical gradients for the MPC first action through a KKT-based auxiliary optimization, covering linear MPC with linear inequalities and nonlinear MPC via SQP, and then updates the cost parameters with a projected gradient method. Empirical results in simulation and a high-fidelity RotorPy quadrotor simulator show faster convergence and better generalization than baselines, demonstrating the practicality of gradient-based, closed-loop auto-tuning for MPC. This work broadens differentiable programming for control by enabling cost-function learning for both linear and nonlinear MPC under constraints, with promising implications for robust, constraint-aware robotic control.

Abstract

Paper Structure (18 sections, 39 equations, 7 figures, 4 tables)

This paper contains 18 sections, 39 equations, 7 figures, 4 tables.

Introduction
Related Work
Background
Problem Formulation
The DiffTune-MPC Method
Differentiating a Linear MPC
Linear MPC with interpretable quadratic cost function
Nonlinear MPC
Simulation results
Experimental Results with a High-fidelity Quadrotor Simulator
Conclusion
Differentiation of an LQR
Differentiation of an LQR with $(Q,R)$-parametrization
Differentiation of an LQR with linear inequality constraints
Differential wheeled robot simulation
...and 3 more sections

Figures (7)

Figure 1: Simulation results comparing the learning progress of DiffTune-MPC (ours) and the baseline PS-MPC song2022policy. The shaded areas show the range of RMSEs (min to max) achieved with a total of 10 sampled parameters by PS-MPC.
Figure 2: Left: Evolution of the tracking performance with DiffTune-MPC in RotorPy over a total of 11 3D Lissajous training trajectories. The curve shows the mean RMSE, whereas the shaded area shows the standard deviation. Right: Performance comparison over the 3D Lissajous trajectory with a total duration of 10 s.
Figure 3: Optimal closed-loop control actions with the differential wheeled robot under trajectory 2.
Figure 4: Optimal control actions and RMSE reduction subject to box constraints with different levels of tightness $u_{bd}$. Tighter constraints imply more frequent control saturation, which impedes learning from improving tracking performance.
Figure 5: Polynomial Trajectory 1 used in Table \ref{['table:comp_W']}
...and 2 more figures

Theorems & Definitions (4)

Remark 1
Remark 2
Remark 3
Remark 4

DiffTune-MPC: Closed-Loop Learning for Model Predictive Control

TL;DR

Abstract

DiffTune-MPC: Closed-Loop Learning for Model Predictive Control

Authors

TL;DR

Abstract

Table of Contents

Figures (7)

Theorems & Definitions (4)