A Study on the Use of Simulation in Synthesizing Path-Following Control Policies for Autonomous Ground Robots
Harry Zhang, Stefan Caldararu, Aaron Young, Alexis Ruiz, Huzaifa Unjhawala, Ishaan Mahajan, Sriram Ashokkumar, Nevindu Batagoda, Zhenhao Zhou, Luning Bakke, Dan Negrut
TL;DR
The paper addresses the challenge of synthesizing path-following policies for autonomous ground robots using a simulator with a validated digital twin. It compares four stock policies—PID, MPC, NN-MPC, and NN-HD—trained in simulation (NNs via imitation learning) and evaluated zero-shot in a real 1/6th scale ART, using test-randomization to rank policy robustness. The key finding is that the simulation-derived policy ranking correlates well with real-world performance on most paths, demonstrating that expeditious, simulator-driven policy synthesis is feasible with proper calibration and reproducibility, though some gaps remain for certain policies. This work provides a practical workflow for rapid policy Synthesis and evaluation and highlights the role of test randomization in understanding robustness prior to hardware deployment.
Abstract
We report results obtained and insights gained while answering the following question: how effective is it to use a simulator to establish path following control policies for an autonomous ground robot? While the quality of the simulator conditions the answer to this question, we found that for the simulation platform used herein, producing four control policies for path planning was straightforward once a digital twin of the controlled robot was available. The control policies established in simulation and subsequently demonstrated in the real world are PID control, MPC, and two neural network (NN) based controllers. Training the two NN controllers via imitation learning was accomplished expeditiously using seven simple maneuvers: follow three circles clockwise, follow the same circles counter-clockwise, and drive straight. A test randomization process that employs random micro-simulations is used to rank the ``goodness'' of the four control policies. The policy ranking noted in simulation correlates well with the ranking observed when the control policies were tested in the real world. The simulation platform used is publicly available and BSD3-released as open source; a public Docker image is available for reproducibility studies. It contains a dynamics engine, a sensor simulator, a ROS2 bridge, and a ROS2 autonomy stack the latter employed both in the simulator and the real world experiments.
