Output-Sampled Model Predictive Path Integral Control (o-MPPI) for Increased Efficiency

Leon; Yan; Santosh Devasia

Output-Sampled Model Predictive Path Integral Control (o-MPPI) for Increased Efficiency

Leon, Yan, Santosh Devasia

TL;DR

The paper tackles the efficiency challenge of model predictive path integral control (MPPI) under output-constraint conditions in dynamic environments. It introduces output-sampling-based MPPI (o-MPPI), which samples trajectories in the output space and uses an inverse dynamics map $G^{-1}$ to obtain the corresponding inputs, keeping the MPPI cost and weighting framework. Empirical results in dynamic autonomous driving scenarios show o-MPPI achieving similar success with far fewer rollouts ($M$) and shorter horizons ($T$) — specifically about 20× fewer rollouts and 4× shorter horizons — compared with standard MPPI. The approach unifies trajectory-planning methods with inverse mappings to improve constraint satisfaction and computational efficiency, with broader potential for integration with learning-based inverse models and other planners.

Abstract

The success of the model predictive path integral control (MPPI) approach depends on the appropriate selection of the input distribution used for sampling. However, it can be challenging to select inputs that satisfy output constraints in dynamic environments. The main contribution of this paper is to propose an output-sampling-based MPPI (o-MPPI), which improves the ability of samples to satisfy output constraints and thereby increases MPPI efficiency. Comparative simulations and experiments of dynamic autonomous driving of bots around a track are provided to show that the proposed o-MPPI is more efficient and requires substantially (20-times) less number of rollouts and (4-times) smaller prediction horizon when compared with the standard MPPI for similar success rates. The supporting video for the paper can be found at https://youtu.be/snhlZj3l5CE.

Output-Sampled Model Predictive Path Integral Control (o-MPPI) for Increased Efficiency

TL;DR

to obtain the corresponding inputs, keeping the MPPI cost and weighting framework. Empirical results in dynamic autonomous driving scenarios show o-MPPI achieving similar success with far fewer rollouts (

) and shorter horizons (

) — specifically about 20× fewer rollouts and 4× shorter horizons — compared with standard MPPI. The approach unifies trajectory-planning methods with inverse mappings to improve constraint satisfaction and computational efficiency, with broader potential for integration with learning-based inverse models and other planners.

Abstract

Paper Structure (26 sections, 13 equations, 9 figures, 3 tables, 2 algorithms)

This paper contains 26 sections, 13 equations, 9 figures, 3 tables, 2 algorithms.

Introduction
Related work
MPPI with adjusted trajectory distribution
MPPI with output-space-informed mean
Inverse models
Proposed Framework
Application of o-MPPI to Experimental setup
System description
Standard MPPI
Forward model $F$
Cost function selection
MPPI algorithm
Weighting
o-MPPI
Sampling the trajectory rollout
...and 11 more sections

Figures (9)

Figure 1: Comparison between the proposed output-MPPI (o-MPPI) in red (top) and the standard MPPI in blue (bottom). There are $M$ rollouts and $f(\cdot)$ maps inputs to states and outputs in the standard MPPI and the inverse $G^{-1}(\cdot)$ is the inverse dynamics that maps output trajectories to inputs and the states in the proposed o-MPPI. $S^m$ is the cost for the $m^{th}$ rollout output-state-input $(Y^m, X^m,u^m)$. $\lambda \in R^+$ is the temperature parameter used in the weighting to obtain the optimized input $u$.
Figure 2: A generic way of sampling trajectory rollouts based on fitted smooth curves specified by waypoints that are selected from the regions of interest (dashed ellipsoids).
Figure 3: The left plot is the schematic drawing of the track. The middle plot illustrates an example region of interest (dashed blue rectangle) from which the output waypoint is sampled. The right plot shows an example output rollout where the solid circles indicate the current position and dashed circles indicate future positions in the rollout. The bright yellow rectangles indicate the collision regions.
Figure 4: The positions of the bots (TurtleBot3 Burger) are estimated from images taken with an Intel RealSense D435 camera. The control algorithms are run on a computer and the control commands are sent to the robots via a wireless router. In the Turtle Track photo, the blue tapes are used to indicate the track boundaries.
Figure 5: Initial rollouts for Cases 1 to 4 (left to right). The green circle represents the controlled bot with the red line indicating its orientation. The blue circle denotes the constant-speed bot of speed $10$ cm/s. The bright yellow rectangle depicts the collision region of the constant-speed bot. The cost function component $c_c$ in Eq. \ref{['eq_collide_cost']} will be a large value if the controlled bot $(x_k,y_k)$ runs inside the (extended) collision region. Grey lines demonstrate the trajectory rollouts of Case 1 to 4 in Section \ref{['subsec_overtake_case_study']}.
...and 4 more figures

Output-Sampled Model Predictive Path Integral Control (o-MPPI) for Increased Efficiency

TL;DR

Abstract

Output-Sampled Model Predictive Path Integral Control (o-MPPI) for Increased Efficiency

Authors

TL;DR

Abstract

Table of Contents

Figures (9)