CoVO-MPC: Theoretical Analysis of Sampling-based MPC and Optimal Covariance Design

Zeji Yi; Chaoyi Pan; Guanqi He; Guannan Qu; Guanya Shi

CoVO-MPC: Theoretical Analysis of Sampling-based MPC and Optimal Covariance Design

Zeji Yi, Chaoyi Pan, Guanqi He, Guannan Qu, Guanya Shi

TL;DR

This paper characterize the convergence property of a widely used sampling-based MPC method, Model Predictive Path Integral Control (MPPI), and shows that MPPI enjoys at least linear convergence rates when the optimization is quadratic, which covers time-varying LQR systems.

Abstract

Sampling-based Model Predictive Control (MPC) has been a practical and effective approach in many domains, notably model-based reinforcement learning, thanks to its flexibility and parallelizability. Despite its appealing empirical performance, the theoretical understanding, particularly in terms of convergence analysis and hyperparameter tuning, remains absent. In this paper, we characterize the convergence property of a widely used sampling-based MPC method, Model Predictive Path Integral Control (MPPI). We show that MPPI enjoys at least linear convergence rates when the optimization is quadratic, which covers time-varying LQR systems. We then extend to more general nonlinear systems. Our theoretical analysis directly leads to a novel sampling-based MPC algorithm, CoVariance-Optimal MPC (CoVo-MPC) that optimally schedules the sampling covariance to optimize the convergence rate. Empirically, CoVo-MPC significantly outperforms standard MPPI by 43-54% in both simulations and real-world quadrotor agile control tasks. Videos and Appendices are available at \url{https://lecar-lab.github.io/CoVO-MPC/}.

CoVO-MPC: Theoretical Analysis of Sampling-based MPC and Optimal Covariance Design

TL;DR

Abstract

Paper Structure (26 sections, 7 theorems, 79 equations, 5 figures, 5 tables)

This paper contains 26 sections, 7 theorems, 79 equations, 5 figures, 5 tables.

Introduction
Preliminaries and Related Work
Optimal Control and MPC
Sampling-based MPC and Applications in Model-based Reinforcement Learning
Problem Formulation
Main Theoretical Results
Convergence Analysis for Quadratic J(U)
Optimal Covariance Design
Generalization Beyond the Quadratic J(U)
The CoVariance-Optimal MPC (CoVO-MPC) Algorithm
Offline covariance matrix approximation.
Experiments
Tasks and Implementations
Performance and Computational Cost
Limitations and Future Work
...and 11 more sections

Key Result

theorem 1

Given $U_{\mathrm{in}}$ and the sampling distribution $\mathcal{N}(U_{\mathrm{in}},\Sigma)$, under the assumption that the total cost $J(U)$ is in quadratic form in eq:quad_cost and $D$ is positive semi-definite, the weighted sum of the samples $U_{\mathrm{out}}$, as in eq:MPPI, converges in probabi Similarly, the expected cost contracts to the optima $J^*$ in the following way: where $J_{\mathrm

Figures (5)

Figure 1: CoVO-MPC
Figure 2: Quadrotor tracking errors as the sample number $N$ increases.
Figure 3: (a-b) Real-world quadrotor trajectory tracking results. CoVO-MPC can track the challenging infeasible triangle trajectory closer than MPPI. (c) The cost distribution of sampled trajectories at a certain time step in Quadrotor simulation. The cost of CoVO-MPC is more concentrated and has a lower mean than MPPI.
Figure 4: The experiment setup.
Figure 5: Covariance matrix visualization at certain timestep. The most right one is the element-wise difference between CoVO-MPC and MPPI.

Theorems & Definitions (15)

theorem 1
theorem 2
corollary 1
theorem 3
theorem 4
corollary 2
lemma 1: Slutsky' theorem
proof
proof
remark 1
...and 5 more

CoVO-MPC: Theoretical Analysis of Sampling-based MPC and Optimal Covariance Design

TL;DR

Abstract

CoVO-MPC: Theoretical Analysis of Sampling-based MPC and Optimal Covariance Design

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (5)

Theorems & Definitions (15)