Efficient Reinforcement Learning for Autonomous Driving with Parameterized Skills and Priors
Letian Wang, Jie Liu, Hao Shao, Wenshuo Wang, Ruobing Chen, Yu Liu, Steven L. Waslander
TL;DR
The paper tackles learning autonomous driving policies in dense, interactive traffic where learning directly over low-level controls is inefficient. It introduces ASAP-RL, which learns over parameterized ego-centric motion skills and leverages expert priors via inverse skill parameter recovery and a double initialization strategy for actor and critic pretraining. The approach yields higher learning efficiency and better driving performance than baselines that use skills or priors separately, demonstrated across highway, intersection, and roundabout scenarios with sparse rewards. By operating in the skill parameter space and exploiting priors, ASAP-RL achieves robust, diverse, and safe driving maneuvers with improved sample efficiency. The work provides practical avenues for deploying RL-based autonomous driving in complex real-world traffic and includes open-source code to facilitate further research.
Abstract
When autonomous vehicles are deployed on public roads, they will encounter countless and diverse driving situations. Many manually designed driving policies are difficult to scale to the real world. Fortunately, reinforcement learning has shown great success in many tasks by automatic trial and error. However, when it comes to autonomous driving in interactive dense traffic, RL agents either fail to learn reasonable performance or necessitate a large amount of data. Our insight is that when humans learn to drive, they will 1) make decisions over the high-level skill space instead of the low-level control space and 2) leverage expert prior knowledge rather than learning from scratch. Inspired by this, we propose ASAP-RL, an efficient reinforcement learning algorithm for autonomous driving that simultaneously leverages motion skills and expert priors. We first parameterized motion skills, which are diverse enough to cover various complex driving scenarios and situations. A skill parameter inverse recovery method is proposed to convert expert demonstrations from control space to skill space. A simple but effective double initialization technique is proposed to leverage expert priors while bypassing the issue of expert suboptimality and early performance degradation. We validate our proposed method on interactive dense-traffic driving tasks given simple and sparse rewards. Experimental results show that our method can lead to higher learning efficiency and better driving performance relative to previous methods that exploit skills and priors differently. Code is open-sourced to facilitate further research.
