Multi-Timescale Hierarchical Reinforcement Learning for Unified Behavior and Control of Autonomous Driving

Guizhe Jin; Zhuoren Li; Bo Leng; Ran Yu; Lu Xiong; Chen Sun

Multi-Timescale Hierarchical Reinforcement Learning for Unified Behavior and Control of Autonomous Driving

Guizhe Jin, Zhuoren Li, Bo Leng, Ran Yu, Lu Xiong, Chen Sun

TL;DR

This work introduces a multi-timescale hierarchical reinforcement learning framework for autonomous driving that jointly trains a high-level motion-guidance policy and a low-level execution policy. The high level produces long-timescale motion guidance via a hybrid discrete-continuous action, while the low level generates short-timescale control commands, with a safety mechanism operating across both levels. Motion guidance is represented explicitly as a path-point-based trajectory, incrementally updated to avoid state inconsistencies in the low-level policy. Evaluations on simulator-based highway scenarios and the HighD dataset show improvements in driving efficiency, action consistency, and safety, with the safety-aware variant delivering the strongest performance and robustness.

Abstract

Reinforcement Learning (RL) is increasingly used in autonomous driving (AD) and shows clear advantages. However, most RL-based AD methods overlook policy structure design. An RL policy that only outputs short-timescale vehicle control commands results in fluctuating driving behavior due to fluctuations in network outputs, while one that only outputs long-timescale driving goals cannot achieve unified optimality of driving behavior and control. Therefore, we propose a multi-timescale hierarchical reinforcement learning approach. Our approach adopts a hierarchical policy structure, where high- and low-level RL policies are unified-trained to produce long-timescale motion guidance and short-timescale control commands, respectively. Therein, motion guidance is explicitly represented by hybrid actions to capture multimodal driving behaviors on structured road and support incremental low-level extend-state updates. Additionally, a hierarchical safety mechanism is designed to ensure multi-timescale safety. Evaluation in simulator-based and HighD dataset-based highway multi-lane scenarios demonstrates that our approach significantly improves AD performance, effectively increasing driving efficiency, action consistency and safety.

Multi-Timescale Hierarchical Reinforcement Learning for Unified Behavior and Control of Autonomous Driving

TL;DR

Abstract

Multi-Timescale Hierarchical Reinforcement Learning for Unified Behavior and Control of Autonomous Driving

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (6)