Table of Contents
Fetching ...

CoVO-MPC: Theoretical Analysis of Sampling-based MPC and Optimal Covariance Design

Zeji Yi, Chaoyi Pan, Guanqi He, Guannan Qu, Guanya Shi

TL;DR

This paper characterize the convergence property of a widely used sampling-based MPC method, Model Predictive Path Integral Control (MPPI), and shows that MPPI enjoys at least linear convergence rates when the optimization is quadratic, which covers time-varying LQR systems.

Abstract

Sampling-based Model Predictive Control (MPC) has been a practical and effective approach in many domains, notably model-based reinforcement learning, thanks to its flexibility and parallelizability. Despite its appealing empirical performance, the theoretical understanding, particularly in terms of convergence analysis and hyperparameter tuning, remains absent. In this paper, we characterize the convergence property of a widely used sampling-based MPC method, Model Predictive Path Integral Control (MPPI). We show that MPPI enjoys at least linear convergence rates when the optimization is quadratic, which covers time-varying LQR systems. We then extend to more general nonlinear systems. Our theoretical analysis directly leads to a novel sampling-based MPC algorithm, CoVariance-Optimal MPC (CoVo-MPC) that optimally schedules the sampling covariance to optimize the convergence rate. Empirically, CoVo-MPC significantly outperforms standard MPPI by 43-54% in both simulations and real-world quadrotor agile control tasks. Videos and Appendices are available at \url{https://lecar-lab.github.io/CoVO-MPC/}.

CoVO-MPC: Theoretical Analysis of Sampling-based MPC and Optimal Covariance Design

TL;DR

This paper characterize the convergence property of a widely used sampling-based MPC method, Model Predictive Path Integral Control (MPPI), and shows that MPPI enjoys at least linear convergence rates when the optimization is quadratic, which covers time-varying LQR systems.

Abstract

Sampling-based Model Predictive Control (MPC) has been a practical and effective approach in many domains, notably model-based reinforcement learning, thanks to its flexibility and parallelizability. Despite its appealing empirical performance, the theoretical understanding, particularly in terms of convergence analysis and hyperparameter tuning, remains absent. In this paper, we characterize the convergence property of a widely used sampling-based MPC method, Model Predictive Path Integral Control (MPPI). We show that MPPI enjoys at least linear convergence rates when the optimization is quadratic, which covers time-varying LQR systems. We then extend to more general nonlinear systems. Our theoretical analysis directly leads to a novel sampling-based MPC algorithm, CoVariance-Optimal MPC (CoVo-MPC) that optimally schedules the sampling covariance to optimize the convergence rate. Empirically, CoVo-MPC significantly outperforms standard MPPI by 43-54% in both simulations and real-world quadrotor agile control tasks. Videos and Appendices are available at \url{https://lecar-lab.github.io/CoVO-MPC/}.
Paper Structure (26 sections, 7 theorems, 79 equations, 5 figures, 5 tables)

This paper contains 26 sections, 7 theorems, 79 equations, 5 figures, 5 tables.

Key Result

theorem 1

Given $U_{\mathrm{in}}$ and the sampling distribution $\mathcal{N}(U_{\mathrm{in}},\Sigma)$, under the assumption that the total cost $J(U)$ is in quadratic form in eq:quad_cost and $D$ is positive semi-definite, the weighted sum of the samples $U_{\mathrm{out}}$, as in eq:MPPI, converges in probabi Similarly, the expected cost contracts to the optima $J^*$ in the following way: where $J_{\mathrm

Figures (5)

  • Figure 1: CoVO-MPC
  • Figure 2: Quadrotor tracking errors as the sample number $N$ increases.
  • Figure 3: (a-b) Real-world quadrotor trajectory tracking results. CoVO-MPC can track the challenging infeasible triangle trajectory closer than MPPI. (c) The cost distribution of sampled trajectories at a certain time step in Quadrotor simulation. The cost of CoVO-MPC is more concentrated and has a lower mean than MPPI.
  • Figure 4: The experiment setup.
  • Figure 5: Covariance matrix visualization at certain timestep. The most right one is the element-wise difference between CoVO-MPC and MPPI.

Theorems & Definitions (15)

  • theorem 1
  • theorem 2
  • corollary 1
  • theorem 3
  • theorem 4
  • corollary 2
  • lemma 1: Slutsky' theorem
  • proof
  • proof
  • remark 1
  • ...and 5 more