Table of Contents
Fetching ...

An Unsupervised C-Uniform Trajectory Sampler with Applications to Model Predictive Path Integral Control

O. Goktug Poyrazoglu, Rahul Moorthy, Yukang Cao, William Chastek, Volkan Isler

TL;DR

The paper tackles limited exploration in sampling-based MPC by introducing Neural C-Uniform, an unsupervised learner that maps states to control-input probabilities to uniformly cover the configuration space without discretization. It then couples this sampler with MPPI in CU-MPPI, using Neural C-Uniform trajectories to select a strong nominal and to guide stochastic sampling, with results showing improved performance on high-curvature paths and longer horizons. Key contributions include the entropy-maximization formulation for Neural C-Uniform, its neural architecture, and the CU-MPPI framework, validated across simulation and real-world cluttered and dynamic environments. The approach offers a scalable, uniformly exploratory alternative to gradient-reliant refinements, with practical impact for robust navigation in complex, changing settings.

Abstract

Sampling-based model predictive controllers generate trajectories by sampling control inputs from a fixed, simple distribution such as the normal or uniform distributions. This sampling method yields trajectory samples that are tightly clustered around a mean trajectory. This clustering behavior in turn, limits the exploration capability of the controller and reduces the likelihood of finding feasible solutions in complex environments. Recent work has attempted to address this problem by either reshaping the resulting trajectory distribution or increasing the sample entropy to enhance diversity and promote exploration. In our recent work, we introduced the concept of C-Uniform trajectory generation [1] which allows the computation of control input probabilities to generate trajectories that sample the configuration space uniformly. In this work, we first address the main limitation of this method: lack of scalability due to computational complexity. We introduce Neural C-Uniform, an unsupervised C-Uniform trajectory sampler that mitigates scalability issues by computing control input probabilities without relying on a discretized configuration space. Experiments show that Neural C-Uniform achieves a similar uniformity ratio to the original C-Uniform approach and generates trajectories over a longer time horizon while preserving uniformity. Next, we present CU-MPPI, which integrates Neural C-Uniform sampling into existing MPPI variants. We analyze the performance of CU-MPPI in simulation and real-world experiments. Our results indicate that in settings where the optimal solution has high curvature, CU-MPPI leads to drastic improvements in performance.

An Unsupervised C-Uniform Trajectory Sampler with Applications to Model Predictive Path Integral Control

TL;DR

The paper tackles limited exploration in sampling-based MPC by introducing Neural C-Uniform, an unsupervised learner that maps states to control-input probabilities to uniformly cover the configuration space without discretization. It then couples this sampler with MPPI in CU-MPPI, using Neural C-Uniform trajectories to select a strong nominal and to guide stochastic sampling, with results showing improved performance on high-curvature paths and longer horizons. Key contributions include the entropy-maximization formulation for Neural C-Uniform, its neural architecture, and the CU-MPPI framework, validated across simulation and real-world cluttered and dynamic environments. The approach offers a scalable, uniformly exploratory alternative to gradient-reliant refinements, with practical impact for robust navigation in complex, changing settings.

Abstract

Sampling-based model predictive controllers generate trajectories by sampling control inputs from a fixed, simple distribution such as the normal or uniform distributions. This sampling method yields trajectory samples that are tightly clustered around a mean trajectory. This clustering behavior in turn, limits the exploration capability of the controller and reduces the likelihood of finding feasible solutions in complex environments. Recent work has attempted to address this problem by either reshaping the resulting trajectory distribution or increasing the sample entropy to enhance diversity and promote exploration. In our recent work, we introduced the concept of C-Uniform trajectory generation [1] which allows the computation of control input probabilities to generate trajectories that sample the configuration space uniformly. In this work, we first address the main limitation of this method: lack of scalability due to computational complexity. We introduce Neural C-Uniform, an unsupervised C-Uniform trajectory sampler that mitigates scalability issues by computing control input probabilities without relying on a discretized configuration space. Experiments show that Neural C-Uniform achieves a similar uniformity ratio to the original C-Uniform approach and generates trajectories over a longer time horizon while preserving uniformity. Next, we present CU-MPPI, which integrates Neural C-Uniform sampling into existing MPPI variants. We analyze the performance of CU-MPPI in simulation and real-world experiments. Our results indicate that in settings where the optimal solution has high curvature, CU-MPPI leads to drastic improvements in performance.

Paper Structure

This paper contains 19 sections, 10 equations, 7 figures, 2 tables, 1 algorithm.

Figures (7)

  • Figure 1: Comparison of trajectory samples. The robot's configuration space is $(x,y,\theta)$. The forward speed is constant. The steering angle is directly controllable within $[-30, +30]$ degrees/second. The light blue area is the portion of the plane reachable within 3s. The dark blue part is the region visited by 10K trajectories generated by each approach. The color transition illustrates coverage in $\theta$ (dark blue indicates lowest to red the highest). C-Uniform and Neural C-Uniform show high diversity in all three dimensions of the configuration space compared to MPPI and Log-MPPI.
  • Figure 2: CU-MPPI: The Neural C-Uniform trajectories represented by blue in Figure (a) are first evaluated for the cost and the trajectory with minimum cost is selected and shown in green. MPPI is initialized using the green trajectory as the nominal in Figure (b) and the MPPI-generated trajectories are shown in red. Lastly, Figure (c) shows the final trajectory generated using control signals of MPPI, which is represented by cyan. Additionally, the red rectangle represents the current vehicle configuration, and the blue circle shows the target position.
  • Figure 3: Uniformity Analysis: Neural C-Uniform learns to sample uniformly on all level sets. The X-axis represents the level sets for a 4-second time horizon with 0.2-second discretization. We test the uniformity performance of Neural C-Uniform for extrapolation. The extrapolation (blue) experiment focuses on training with a 3-second time horizon with 0.2-second discretization and testing on a 4-second horizon which shows the capability of Neural C-Uniform to plan for longer horizons. It can be observed that Neural C-Uniform has high uniformity on all level sets.
  • Figure 4: Circular Motion: CU-MPPI (green), and CU-LogMPPI (cyan), can navigate to a goal configuration while following the optimal path. The ability to generate high curvature turns by Neural C-Uniform sampling helps in identifying the optimal regions. Log-MPPI (black and pink) achieves the closest results to our methods, but the higher variance (black) reduces the solution time. In contrast, MPPI (red) struggles due to limited exploration, and SVG-MPPI (dark blue) fails to steer trajectories effectively due to insufficient gradient information. Note that MPPI trajectories overlap with SVG-MPPI's trajectories, thus invisible in the figure.
  • Figure 5: Three C2C tasks which demonstrate the performance of our method in high-curvature cases. The results highlight the number of successful runs out of 10 trials of the methods in these three settings. In (a), all methods reach the target configuration with CU-based methods and SVG-MPPI (dark blue) converging quicker than other baselines, which require more iterations to shift their distribution to the optimal one. In (b)-(c), CU-MPPI (green), and CU-LogMPPI (cyan) perform successfully whereas the other baselines start having issues to reach the target position due to the limited exploration for Log-MPPI (pink), MPPI (red), and insufficient gradient calculation for SVG-MPPI.
  • ...and 2 more figures