Table of Contents
Fetching ...

Multi-Timescale Hierarchical Reinforcement Learning for Unified Behavior and Control of Autonomous Driving

Guizhe Jin, Zhuoren Li, Bo Leng, Ran Yu, Lu Xiong, Chen Sun

TL;DR

This work introduces a multi-timescale hierarchical reinforcement learning framework for autonomous driving that jointly trains a high-level motion-guidance policy and a low-level execution policy. The high level produces long-timescale motion guidance via a hybrid discrete-continuous action, while the low level generates short-timescale control commands, with a safety mechanism operating across both levels. Motion guidance is represented explicitly as a path-point-based trajectory, incrementally updated to avoid state inconsistencies in the low-level policy. Evaluations on simulator-based highway scenarios and the HighD dataset show improvements in driving efficiency, action consistency, and safety, with the safety-aware variant delivering the strongest performance and robustness.

Abstract

Reinforcement Learning (RL) is increasingly used in autonomous driving (AD) and shows clear advantages. However, most RL-based AD methods overlook policy structure design. An RL policy that only outputs short-timescale vehicle control commands results in fluctuating driving behavior due to fluctuations in network outputs, while one that only outputs long-timescale driving goals cannot achieve unified optimality of driving behavior and control. Therefore, we propose a multi-timescale hierarchical reinforcement learning approach. Our approach adopts a hierarchical policy structure, where high- and low-level RL policies are unified-trained to produce long-timescale motion guidance and short-timescale control commands, respectively. Therein, motion guidance is explicitly represented by hybrid actions to capture multimodal driving behaviors on structured road and support incremental low-level extend-state updates. Additionally, a hierarchical safety mechanism is designed to ensure multi-timescale safety. Evaluation in simulator-based and HighD dataset-based highway multi-lane scenarios demonstrates that our approach significantly improves AD performance, effectively increasing driving efficiency, action consistency and safety.

Multi-Timescale Hierarchical Reinforcement Learning for Unified Behavior and Control of Autonomous Driving

TL;DR

This work introduces a multi-timescale hierarchical reinforcement learning framework for autonomous driving that jointly trains a high-level motion-guidance policy and a low-level execution policy. The high level produces long-timescale motion guidance via a hybrid discrete-continuous action, while the low level generates short-timescale control commands, with a safety mechanism operating across both levels. Motion guidance is represented explicitly as a path-point-based trajectory, incrementally updated to avoid state inconsistencies in the low-level policy. Evaluations on simulator-based highway scenarios and the HighD dataset show improvements in driving efficiency, action consistency, and safety, with the safety-aware variant delivering the strongest performance and robustness.

Abstract

Reinforcement Learning (RL) is increasingly used in autonomous driving (AD) and shows clear advantages. However, most RL-based AD methods overlook policy structure design. An RL policy that only outputs short-timescale vehicle control commands results in fluctuating driving behavior due to fluctuations in network outputs, while one that only outputs long-timescale driving goals cannot achieve unified optimality of driving behavior and control. Therefore, we propose a multi-timescale hierarchical reinforcement learning approach. Our approach adopts a hierarchical policy structure, where high- and low-level RL policies are unified-trained to produce long-timescale motion guidance and short-timescale control commands, respectively. Therein, motion guidance is explicitly represented by hybrid actions to capture multimodal driving behaviors on structured road and support incremental low-level extend-state updates. Additionally, a hierarchical safety mechanism is designed to ensure multi-timescale safety. Evaluation in simulator-based and HighD dataset-based highway multi-lane scenarios demonstrates that our approach significantly improves AD performance, effectively increasing driving efficiency, action consistency and safety.

Paper Structure

This paper contains 30 sections, 22 equations, 6 figures, 3 tables, 1 algorithm.

Figures (6)

  • Figure 1: The framework of multi-timescale hierarchical RL approach. It consists of two unified-trained policies at different levels, with supporting safety mechanisms. High‑level motion guidance that, combined with environment features, forms the extended input of the low‑level policy, while low‑level rewards are fed back in expectation to the high‑level for joint optimization.
  • Figure 2: Illustration of the low-level extend-state transition. Assume that $\pi ^ h$ generates a guidance-action of 'Lane Left' for $T^h$. When the vehicle crosses the lane divider at $iT^l$, the guidance-action observed from the agent's viewpoint becomes 'Lane Keeping', even though the $T^h$ has not yet ended.
  • Figure 3: The training process of our method with comparison methods.
  • Figure 4: The joint distribution of acceleration and speed in testing.
  • Figure 5: The joint distribution of steering angle and CDD in testing.
  • ...and 1 more figures