Table of Contents
Fetching ...

A Study on the Use of Simulation in Synthesizing Path-Following Control Policies for Autonomous Ground Robots

Harry Zhang, Stefan Caldararu, Aaron Young, Alexis Ruiz, Huzaifa Unjhawala, Ishaan Mahajan, Sriram Ashokkumar, Nevindu Batagoda, Zhenhao Zhou, Luning Bakke, Dan Negrut

TL;DR

The paper addresses the challenge of synthesizing path-following policies for autonomous ground robots using a simulator with a validated digital twin. It compares four stock policies—PID, MPC, NN-MPC, and NN-HD—trained in simulation (NNs via imitation learning) and evaluated zero-shot in a real 1/6th scale ART, using test-randomization to rank policy robustness. The key finding is that the simulation-derived policy ranking correlates well with real-world performance on most paths, demonstrating that expeditious, simulator-driven policy synthesis is feasible with proper calibration and reproducibility, though some gaps remain for certain policies. This work provides a practical workflow for rapid policy Synthesis and evaluation and highlights the role of test randomization in understanding robustness prior to hardware deployment.

Abstract

We report results obtained and insights gained while answering the following question: how effective is it to use a simulator to establish path following control policies for an autonomous ground robot? While the quality of the simulator conditions the answer to this question, we found that for the simulation platform used herein, producing four control policies for path planning was straightforward once a digital twin of the controlled robot was available. The control policies established in simulation and subsequently demonstrated in the real world are PID control, MPC, and two neural network (NN) based controllers. Training the two NN controllers via imitation learning was accomplished expeditiously using seven simple maneuvers: follow three circles clockwise, follow the same circles counter-clockwise, and drive straight. A test randomization process that employs random micro-simulations is used to rank the ``goodness'' of the four control policies. The policy ranking noted in simulation correlates well with the ranking observed when the control policies were tested in the real world. The simulation platform used is publicly available and BSD3-released as open source; a public Docker image is available for reproducibility studies. It contains a dynamics engine, a sensor simulator, a ROS2 bridge, and a ROS2 autonomy stack the latter employed both in the simulator and the real world experiments.

A Study on the Use of Simulation in Synthesizing Path-Following Control Policies for Autonomous Ground Robots

TL;DR

The paper addresses the challenge of synthesizing path-following policies for autonomous ground robots using a simulator with a validated digital twin. It compares four stock policies—PID, MPC, NN-MPC, and NN-HD—trained in simulation (NNs via imitation learning) and evaluated zero-shot in a real 1/6th scale ART, using test-randomization to rank policy robustness. The key finding is that the simulation-derived policy ranking correlates well with real-world performance on most paths, demonstrating that expeditious, simulator-driven policy synthesis is feasible with proper calibration and reproducibility, though some gaps remain for certain policies. This work provides a practical workflow for rapid policy Synthesis and evaluation and highlights the role of test randomization in understanding robustness prior to hardware deployment.

Abstract

We report results obtained and insights gained while answering the following question: how effective is it to use a simulator to establish path following control policies for an autonomous ground robot? While the quality of the simulator conditions the answer to this question, we found that for the simulation platform used herein, producing four control policies for path planning was straightforward once a digital twin of the controlled robot was available. The control policies established in simulation and subsequently demonstrated in the real world are PID control, MPC, and two neural network (NN) based controllers. Training the two NN controllers via imitation learning was accomplished expeditiously using seven simple maneuvers: follow three circles clockwise, follow the same circles counter-clockwise, and drive straight. A test randomization process that employs random micro-simulations is used to rank the ``goodness'' of the four control policies. The policy ranking noted in simulation correlates well with the ranking observed when the control policies were tested in the real world. The simulation platform used is publicly available and BSD3-released as open source; a public Docker image is available for reproducibility studies. It contains a dynamics engine, a sensor simulator, a ROS2 bridge, and a ROS2 autonomy stack the latter employed both in the simulator and the real world experiments.
Paper Structure (11 sections, 6 equations, 7 figures, 3 tables)

This paper contains 11 sections, 6 equations, 7 figures, 3 tables.

Figures (7)

  • Figure 1: Study overview: MPC, NN and PID policies verified, trained, or tuned in simulation (upper box); policies subsequently tested in reality using a 1/6th vehicle (lower box).
  • Figure 2: Handling of the NN-based policies: upper half illustrates the approach for collecting NN-HD training data, which is done with a human in the loop; lower half shows data collection process for NN-MPC, which relies on an MPC expert, following the idea in pan2020imitation.
  • Figure 3: Micro-simulations: green area stands for a schematic of the "reference tube" (note that vehicle orientation also defines a tube, not shown here); the figure shows two micro-simulations with different settling time lengths.
  • Figure 4: Left column: sample simulation trajectories associated with Path 1 (top) and Path 2 (bottom). Right column: sample real-world trajectories associated with Path 1 (top) and Path 2 (bottom). See Tables \ref{['tab:lateral_error']} and \ref{['tab:heading_error']} for quantitative information regarding the performance of the MPC, NN-MPC, PID, and NN-HD control policies in sim and real.
  • Figure 5: Left column: sample simulation trajectories associated with Path 3. Right column: sample real-world trajectories associated with Path 3. See Tables \ref{['tab:lateral_error']} and \ref{['tab:heading_error']} for quantitative information regarding the performance of the MPC, NN-MPC, PID, and NN-HD control policies in sim and real.
  • ...and 2 more figures