Table of Contents
Fetching ...

Simulation-based reinforcement learning for real-world autonomous driving

Błażej Osiński, Adam Jakubowski, Piotr Miłoś, Paweł Zięcina, Christopher Galias, Silviu Homoceanu, Henryk Michalewski

TL;DR

The paper addresses the challenge of transferring an end-to-end reinforcement learning-based autonomous driving policy trained in a simulator to a real vehicle. It uses RGB input from a single camera plus a separately trained semantic segmentation module, with dense rewards in simulation and extensive domain randomization to promote transfer. Key findings show that regularization, segmentation as an auxiliary representation, and waypoint-based control improve real-world performance, while certain memory-based and discrete-action approaches can hinder transfer; an offline proxy metric is explored as a potential predictor of real-world autonomy. The work provides practical guidelines for sim-to-real autonomous driving, highlights the importance of perception-control-training design choices, and suggests avenues for improving robustness, such as alternative RL algorithms, BEV representations, and model-based enhancements to increase sample efficiency and transfer reliability.

Abstract

We use reinforcement learning in simulation to obtain a driving system controlling a full-size real-world vehicle. The driving policy takes RGB images from a single camera and their semantic segmentation as input. We use mostly synthetic data, with labelled real-world data appearing only in the training of the segmentation network. Using reinforcement learning in simulation and synthetic data is motivated by lowering costs and engineering effort. In real-world experiments we confirm that we achieved successful sim-to-real policy transfer. Based on the extensive evaluation, we analyze how design decisions about perception, control, and training impact the real-world performance.

Simulation-based reinforcement learning for real-world autonomous driving

TL;DR

The paper addresses the challenge of transferring an end-to-end reinforcement learning-based autonomous driving policy trained in a simulator to a real vehicle. It uses RGB input from a single camera plus a separately trained semantic segmentation module, with dense rewards in simulation and extensive domain randomization to promote transfer. Key findings show that regularization, segmentation as an auxiliary representation, and waypoint-based control improve real-world performance, while certain memory-based and discrete-action approaches can hinder transfer; an offline proxy metric is explored as a potential predictor of real-world autonomy. The work provides practical guidelines for sim-to-real autonomous driving, highlights the importance of perception-control-training design choices, and suggests avenues for improving robustness, such as alternative RL algorithms, BEV representations, and model-based enhancements to increase sample efficiency and transfer reliability.

Abstract

We use reinforcement learning in simulation to obtain a driving system controlling a full-size real-world vehicle. The driving policy takes RGB images from a single camera and their semantic segmentation as input. We use mostly synthetic data, with labelled real-world data appearing only in the training of the segmentation network. Using reinforcement learning in simulation and synthetic data is motivated by lowering costs and engineering effort. In real-world experiments we confirm that we achieved successful sim-to-real policy transfer. Based on the extensive evaluation, we analyze how design decisions about perception, control, and training impact the real-world performance.

Paper Structure

This paper contains 31 sections, 9 figures, 1 table.

Figures (9)

  • Figure 1: All real-world scenarios used in our experiments. Left map: (a) autouni-arc, (b) autouni-straight. Center map: (c) factory_city-overpass*, (d) factory_city-overpass_exit. Right map: (e) factory_city-tunnel-bt10*, (f) factory_city-bt10-u_turn, (g) factory_city-u_turn-sud_strasse, (h) factory_city-sud_strasse_u_turn*, (i) factory_city-u_turn-bt10*. Scenarios marked with * were used for training in simulation.
  • Figure 2: Network architecture.
  • Figure 3: Summary of experiments with baselines across nine real-world scenarios. The columns to the right show the mean and max of autonomy (the percentage of distance driven autonomously). Models are sorted according to their mean performance. Print in color for better readability.
  • Figure 4: Average deviation of models from expert trajectories. Measurements based on GPS. The graphs for all scenarios can be found on the website http://bit.ly/34xh7z4
  • Figure 5: Left: Episode scores obtained during training. The variant with less randomization is easier and faster to train. Right: On a holdout town with holdout weather better results are achieved by a model trained with more randomization.
  • ...and 4 more figures