Autonomous Algorithm for Training Autonomous Vehicles with Minimal Human Intervention

Sang-Hyun Lee; Daehyeok Kwon; Seung-Woo Seo

Autonomous Algorithm for Training Autonomous Vehicles with Minimal Human Intervention

Sang-Hyun Lee, Daehyeok Kwon, Seung-Woo Seo

TL;DR

This paper tackles the real-world training bottleneck for autonomous vehicles by integrating three components into a model-agnostic RL framework: (1) aborting episodes when state novelty indicates potential unsafe states, using $e(s)=\|\hat{f}_{\theta}(s)-f(s)\|$ with $f(s)$ as the target and $\hat{f}_{\theta}(s)$ as the predictor, (2) safety-aware resets guided by a rule-based reset policy $\pi_r(a|s)$ to return to a goal state and enable diverse resets, and (3) identifying informative initial states via $e_i = \mathbb{E}_{s \sim D_r^i}[\|\hat{f}_{\theta}(s)-f(s)\|]$ and sampling from $I_k = \{ i \in I \;|\: \lambda_1 \le e_i \le \lambda_2\}$ to build a curriculum that adapts to learning progress. The approach is compatible with any RL method and is validated in CARLA across multiple urban driving tasks, demonstrating competitive driving performance with significantly less human intervention than baselines. Key contributions include a formal problem formulation for real-world RL with minimal resets, a novelty-driven abort mechanism, safety-aware reset behaviors leveraging rule-based methods, and an adaptive initial-state curriculum that reduces sample inefficiency. The work highlights the practical value of reusing rule-based safety components to support RL training in real-world robotics and autonomous driving, with implications for safer and more scalable deployment.

Abstract

Recent reinforcement learning (RL) algorithms have demonstrated impressive results in simulated driving environments. However, autonomous vehicles trained in simulation often struggle to work well in the real world due to the fidelity gap between simulated and real-world environments. While directly training real-world autonomous vehicles with RL algorithms is a promising approach to bypass the fidelity gap problem, it presents several challenges. One critical yet often overlooked challenge is the need to reset a driving environment between every episode. This reset process demands significant human intervention, leading to poor training efficiency in the real world. In this paper, we introduce a novel autonomous algorithm that enables off-the-shelf RL algorithms to train autonomous vehicles with minimal human intervention. Our algorithm reduces unnecessary human intervention by aborting episodes to prevent unsafe states and identifying informative initial states for subsequent episodes. The key idea behind identifying informative initial states is to estimate the expected amount of information that can be obtained from under-explored but reachable states. Our algorithm also revisits rule-based autonomous driving algorithms and highlights their benefits in safely returning an autonomous vehicle to initial states. To evaluate how much human intervention is required during training, we implement challenging urban driving tasks that require an autonomous vehicle to reset to initial states on its own. The experimental results show that our autonomous algorithm is task-agnostic and achieves competitive driving performance with much less human intervention than baselines.

Autonomous Algorithm for Training Autonomous Vehicles with Minimal Human Intervention

TL;DR

with

as the target and

as the predictor, (2) safety-aware resets guided by a rule-based reset policy

to return to a goal state and enable diverse resets, and (3) identifying informative initial states via

and sampling from

to build a curriculum that adapts to learning progress. The approach is compatible with any RL method and is validated in CARLA across multiple urban driving tasks, demonstrating competitive driving performance with significantly less human intervention than baselines. Key contributions include a formal problem formulation for real-world RL with minimal resets, a novelty-driven abort mechanism, safety-aware reset behaviors leveraging rule-based methods, and an adaptive initial-state curriculum that reduces sample inefficiency. The work highlights the practical value of reusing rule-based safety components to support RL training in real-world robotics and autonomous driving, with implications for safer and more scalable deployment.

Abstract

Paper Structure (14 sections, 4 equations, 7 figures, 2 tables, 1 algorithm)

This paper contains 14 sections, 4 equations, 7 figures, 2 tables, 1 algorithm.

INTRODUCTION
RELATED WORKS
TRAINING AUTONOMOUS VEHICLES WITH MINIMAL HUMAN INTERVENTION
Problem Formulation
Aborting Episodes to Prevent Unsafe States
Returning with Safety-aware Reset Behaviors
Identifying Informative Initial States
Training Procedure Details
EXPERIMENTS
Baselines
Implementation Details
Environments
Experimental Results and Analysis
CONCLUSION

Figures (7)

Figure 1: Overview of our autonomous algorithm. Our algorithm aborts an episode when the estimated novelty of the current state is too high. After that, an autonomous vehicle is controlled with the reset policy to return to the next initial state without human intervention. The next initial state is sampled from a set of informative initial states. The timing of the switch to the reset policy is pushed further back as the training progresses.
Figure 2: Urban driving tasks introduced in our experiments. All spawned surrounding vehicles are set to ignore traffic signals, so an autonomous vehicle being trained should consider interactions with them to solve these tasks. The black dotted line denotes the route to a given goal.
Figure 3: Estimated novelty of sampled states in four-way unsignalized intersection and roundabout tasks. Each dimension of the states is normalized to [0, 1], and their colors represent the novelty estimated by the RND predictor. These results indicate that the state space where an autonomous vehicle can be trained continually broadens as the training progresses.
Figure 4: Forward step ratios for three-way and four-way unsignalized intersection tasks. The forward step ratio refers to the ratio between the number of forward time steps and the total number of time steps. The darker-colored lines and shaded areas denote the means and standard deviations over 10 random seeds, respectively.
Figure 5: Effects of identifying informative initial states on performance in five-way unsignalized intersection task. Our algorithm attains a lower average episode step and converges faster than the variant that samples initial states uniformly. The darker-colored lines and shaded areas denote the means and standard deviations over 10 random seeds, respectively.
...and 2 more figures

Autonomous Algorithm for Training Autonomous Vehicles with Minimal Human Intervention

TL;DR

Abstract

Autonomous Algorithm for Training Autonomous Vehicles with Minimal Human Intervention

Authors

TL;DR

Abstract

Table of Contents

Figures (7)