Table of Contents
Fetching ...

Reinforcement Learning Compensated Model Predictive Control for Off-road Driving on Unknown Deformable Terrain

Prakhar Gupta, Jonathon M. Smereka, Yunyi Jia

TL;DR

The paper tackles high-speed off-road autonomous driving on unknown deformable terrains by introducing AC2MPC, a parallel compensation framework that blends a model-based MPC with a PPO-based actor-critic RL compensator. The RL component learns to offset unmodeled nonlinear terrain dynamics, while the MPC preserves safety and feasibility, resulting in improved longitudinal tracking and smoother control, even when data is limited or the agent is under-trained. Validation is performed in a high-fidelity Chrono simulation with deformable terrain (Bekker-Wong SCM), across sandy and clay-like soils, showing consistent performance gains over standalone MPC and RL. The work demonstrates data-efficient learning augmentation suitable for safer real-world deployment and outlines future work to extend to lateral control and real-world platform testing.

Abstract

This study presents an Actor-Critic reinforcement learning Compensated Model Predictive Controller (AC2MPC) designed for high-speed, off-road autonomous driving on deformable terrains. Addressing the difficulty of modeling unknown tire-terrain interaction and ensuring real-time control feasibility and performance, this framework integrates deep reinforcement learning with a model predictive controller to manage unmodeled nonlinear dynamics. We evaluate the controller framework over constant and varying velocity profiles using high-fidelity simulator Project Chrono. Our findings demonstrate that our controller statistically outperforms standalone model-based and learning-based controllers over three unknown terrains that represent sandy deformable track, sandy and rocky track and cohesive clay-like deformable soil track. Despite varied and previously unseen terrain characteristics, this framework generalized well enough to track longitudinal reference speeds with the least error. Furthermore, this framework required significantly less training data compared to purely learning based controller, converging in fewer steps while delivering better performance. Even when under-trained, this controller outperformed the standalone controllers, highlighting its potential for safer and more efficient real-world deployment.

Reinforcement Learning Compensated Model Predictive Control for Off-road Driving on Unknown Deformable Terrain

TL;DR

The paper tackles high-speed off-road autonomous driving on unknown deformable terrains by introducing AC2MPC, a parallel compensation framework that blends a model-based MPC with a PPO-based actor-critic RL compensator. The RL component learns to offset unmodeled nonlinear terrain dynamics, while the MPC preserves safety and feasibility, resulting in improved longitudinal tracking and smoother control, even when data is limited or the agent is under-trained. Validation is performed in a high-fidelity Chrono simulation with deformable terrain (Bekker-Wong SCM), across sandy and clay-like soils, showing consistent performance gains over standalone MPC and RL. The work demonstrates data-efficient learning augmentation suitable for safer real-world deployment and outlines future work to extend to lateral control and real-world platform testing.

Abstract

This study presents an Actor-Critic reinforcement learning Compensated Model Predictive Controller (AC2MPC) designed for high-speed, off-road autonomous driving on deformable terrains. Addressing the difficulty of modeling unknown tire-terrain interaction and ensuring real-time control feasibility and performance, this framework integrates deep reinforcement learning with a model predictive controller to manage unmodeled nonlinear dynamics. We evaluate the controller framework over constant and varying velocity profiles using high-fidelity simulator Project Chrono. Our findings demonstrate that our controller statistically outperforms standalone model-based and learning-based controllers over three unknown terrains that represent sandy deformable track, sandy and rocky track and cohesive clay-like deformable soil track. Despite varied and previously unseen terrain characteristics, this framework generalized well enough to track longitudinal reference speeds with the least error. Furthermore, this framework required significantly less training data compared to purely learning based controller, converging in fewer steps while delivering better performance. Even when under-trained, this controller outperformed the standalone controllers, highlighting its potential for safer and more efficient real-world deployment.
Paper Structure (17 sections, 8 equations, 8 figures, 3 tables, 1 algorithm)

This paper contains 17 sections, 8 equations, 8 figures, 3 tables, 1 algorithm.

Figures (8)

  • Figure 1: AC2MPC controller training and simulation framework.
  • Figure 2: Online reference generation at time-step $i$: In the case where not enough progress is made along the trajectory by $i$, re-planned green references from actual $x_i$ are passed, instead of passing the blue references which represent the references if expected progress were made.
  • Figure 3: Need for learning augmentation to MPC: Even though control inputs are not restricted by the control bounds, vanilla MPC performs significantly worse on deformable terrain than on rigid terrain.
  • Figure 4: Comparison for scenarios 1A and 1B: velocity tracking on terrain-1 for constant and varying reference velocities respectively.
  • Figure 5: Comparison for scenarios 2A and 2B: velocity tracking on terrain-2 for constant and varying reference velocities respectively.
  • ...and 3 more figures