Table of Contents
Fetching ...

Toward Single-Step MPPI via Differentiable Predictive Control

Viet-Anh Le, Renukanandan Tumu, Rahul Mangharam

Abstract

Model predictive path integral (MPPI) is a sampling-based method for solving complex model predictive control (MPC) problems, but its real-time implementation faces two key challenges: the computational cost and sample requirements grow with the prediction horizon, and manually tuning the sampling covariance requires balancing exploration and noise. To address these issues, we propose Step-MPPI, a framework that learns a sampling distribution for efficient single-step lookahead MPPI implementation. Specifically, we use a neural network to parameterize the MPPI proposal distribution at each time step, and train it in a self-supervised manner over a long horizon using the MPC cost, constraint penalties, and a maximum-entropy regularization term. By embedding long-horizon objectives into training the neural distribution policy, Step-MPPI achieves the foresight of a multi-step optimizer with the millisecond-level latency of single-step lookahead. We demonstrate the efficiency of Step-MPPI across multiple challenging tasks in which MPPI suffers from high dimensionality and/or long control horizons.

Toward Single-Step MPPI via Differentiable Predictive Control

Abstract

Model predictive path integral (MPPI) is a sampling-based method for solving complex model predictive control (MPC) problems, but its real-time implementation faces two key challenges: the computational cost and sample requirements grow with the prediction horizon, and manually tuning the sampling covariance requires balancing exploration and noise. To address these issues, we propose Step-MPPI, a framework that learns a sampling distribution for efficient single-step lookahead MPPI implementation. Specifically, we use a neural network to parameterize the MPPI proposal distribution at each time step, and train it in a self-supervised manner over a long horizon using the MPC cost, constraint penalties, and a maximum-entropy regularization term. By embedding long-horizon objectives into training the neural distribution policy, Step-MPPI achieves the foresight of a multi-step optimizer with the millisecond-level latency of single-step lookahead. We demonstrate the efficiency of Step-MPPI across multiple challenging tasks in which MPPI suffers from high dimensionality and/or long control horizons.

Paper Structure

This paper contains 15 sections, 1 theorem, 46 equations, 5 figures, 3 tables.

Key Result

Lemma 1

Consider the importance-sampling weighting strategy eq:mppi_weights_softmax, the gradients of $w_k$ with respect to $\boldsymbol{\mu}$ and $\boldsymbol{L}$ are calculated as follows: where we denote $c^{(k)} = c(\boldsymbol{x}_{h+1}^{(k)}, \boldsymbol{u}_h^{(k)}; \boldsymbol{r}_{h+1})$. $\blacktriangleleft$$\blacktriangleleft$

Figures (5)

  • Figure A1: Overview of the proposed Step-MPPI versus conventional MPPI. In Step-MPPI, samples are drawn from a distribution parameterized by the neural network, whereas MPPI samples from a nominal distribution over the control horizon. During training, Step-MPPI performs rollouts over the full control horizon, while during inference, only a single-step rollout at the current state is needed.
  • Figure D1: Three numerical examples considered for validation.
  • Figure D2: Cross-track error comparison of MPPI, DPC, and Step-MPPI in the autonomous vehicle example.
  • Figure D3: Box-plot comparison of tracking performance for MPPI, DPC, and Step-MPPI in the quadrupedal robot task.
  • Figure D4: Total network accumulation over time for in-distribution (a) and out-of-distribution (b).

Theorems & Definitions (2)

  • Lemma 1
  • proof