Actor-Critic Cooperative Compensation to Model Predictive Control for Off-Road Autonomous Vehicles Under Unknown Dynamics
Prakhar Gupta, Jonathon M Smereka, Yunyi Jia
TL;DR
Problem addressed: achieving reliable longitudinal tracking for off-road vehicles with unknown terrain dynamics. Approach: a cooperative parallel compensation scheme (AC3MPC) that couples a model predictive controller with a learning-based actor-critic, using augmented dynamics to anticipate compensation. Key contributions: a data-efficient learning framework, preservation of MPC horizon robustness, and improved tracking across unseen deformable terrains. Findings: AC3MPC outperforms standalone MPC and AC by up to 29.2% and 10.2% in RMS tracking error, demonstrates better generalization, and requires less training data, including workable under-trained performance. Significance: the method enables safer, more efficient real-time control on deformable terrains with potential deployment on real drive systems.
Abstract
This study presents an Actor-Critic Cooperative Compensated Model Predictive Controller (AC3MPC) designed to address unknown system dynamics. To avoid the difficulty of modeling highly complex dynamics and ensuring realtime control feasibility and performance, this work uses deep reinforcement learning with a model predictive controller in a cooperative framework to handle unknown dynamics. The model-based controller takes on the primary role as both controllers are provided with predictive information about the other. This improves tracking performance and retention of inherent robustness of the model predictive controller. We evaluate this framework for off-road autonomous driving on unknown deformable terrains that represent sandy deformable soil, sandy and rocky soil, and cohesive clay-like deformable soil. Our findings demonstrate that our controller statistically outperforms standalone model-based and learning-based controllers by upto 29.2% and 10.2%. This framework generalized well over varied and previously unseen terrain characteristics to track longitudinal reference speeds with lower errors. Furthermore, this required significantly less training data compared to purely learning-based controller, while delivering better performance even when under-trained.
