Table of Contents
Fetching ...

Imagination-Augmented Hierarchical Reinforcement Learning for Safe and Interactive Autonomous Driving in Urban Environments

Sang-Hyun Lee, Yoonjae Jung, Seung-Woo Seo

TL;DR

IAHRL introduces imagination into hierarchical reinforcement learning to enable safe, interactive urban driving. The approach assigns low-level policies to imagine safe, structured behaviors for high-level actions, while a high-level policy interprets these imagined behaviors through a permutation-invariant attention mechanism to infer interactions with surrounding objects. The method uses a Soft Actor-Critic framework with a MaxEnt objective and defines high-level rewards as the sum of discounted low-level rewards to guide policy learning. Empirical results on five CARLA urban driving tasks show that IAHRL achieves higher success rates and shorter episode lengths than baselines, while maintaining safety and robustness to imagination errors. The work also contributes a permutation-invariant attention design that prioritizes the agent and demonstrates the value of imagination horizons in complex driving scenarios.

Abstract

Hierarchical reinforcement learning (HRL) incorporates temporal abstraction into reinforcement learning (RL) by explicitly taking advantage of hierarchical structure. Modern HRL typically designs a hierarchical agent composed of a high-level policy and low-level policies. The high-level policy selects which low-level policy to activate at a lower frequency and the activated low-level policy selects an action at each time step. Recent HRL algorithms have achieved performance gains over standard RL algorithms in synthetic navigation tasks. However, we cannot apply these HRL algorithms to real-world navigation tasks. One of the main challenges is that real-world navigation tasks require an agent to perform safe and interactive behaviors in dynamic environments. In this paper, we propose imagination-augmented HRL (IAHRL) that efficiently integrates imagination into HRL to enable an agent to learn safe and interactive behaviors in real-world navigation tasks. Imagination is to predict the consequences of actions without interactions with actual environments. The key idea behind IAHRL is that the low-level policies imagine safe and structured behaviors, and then the high-level policy infers interactions with surrounding objects by interpreting the imagined behaviors. We also introduce a new attention mechanism that allows our high-level policy to be permutation-invariant to the order of surrounding objects and to prioritize our agent over them. To evaluate IAHRL, we introduce five complex urban driving tasks, which are among the most challenging real-world navigation tasks. The experimental results indicate that IAHRL enables an agent to perform safe and interactive behaviors, achieving higher success rates and lower average episode steps than baselines.

Imagination-Augmented Hierarchical Reinforcement Learning for Safe and Interactive Autonomous Driving in Urban Environments

TL;DR

IAHRL introduces imagination into hierarchical reinforcement learning to enable safe, interactive urban driving. The approach assigns low-level policies to imagine safe, structured behaviors for high-level actions, while a high-level policy interprets these imagined behaviors through a permutation-invariant attention mechanism to infer interactions with surrounding objects. The method uses a Soft Actor-Critic framework with a MaxEnt objective and defines high-level rewards as the sum of discounted low-level rewards to guide policy learning. Empirical results on five CARLA urban driving tasks show that IAHRL achieves higher success rates and shorter episode lengths than baselines, while maintaining safety and robustness to imagination errors. The work also contributes a permutation-invariant attention design that prioritizes the agent and demonstrates the value of imagination horizons in complex driving scenarios.

Abstract

Hierarchical reinforcement learning (HRL) incorporates temporal abstraction into reinforcement learning (RL) by explicitly taking advantage of hierarchical structure. Modern HRL typically designs a hierarchical agent composed of a high-level policy and low-level policies. The high-level policy selects which low-level policy to activate at a lower frequency and the activated low-level policy selects an action at each time step. Recent HRL algorithms have achieved performance gains over standard RL algorithms in synthetic navigation tasks. However, we cannot apply these HRL algorithms to real-world navigation tasks. One of the main challenges is that real-world navigation tasks require an agent to perform safe and interactive behaviors in dynamic environments. In this paper, we propose imagination-augmented HRL (IAHRL) that efficiently integrates imagination into HRL to enable an agent to learn safe and interactive behaviors in real-world navigation tasks. Imagination is to predict the consequences of actions without interactions with actual environments. The key idea behind IAHRL is that the low-level policies imagine safe and structured behaviors, and then the high-level policy infers interactions with surrounding objects by interpreting the imagined behaviors. We also introduce a new attention mechanism that allows our high-level policy to be permutation-invariant to the order of surrounding objects and to prioritize our agent over them. To evaluate IAHRL, we introduce five complex urban driving tasks, which are among the most challenging real-world navigation tasks. The experimental results indicate that IAHRL enables an agent to perform safe and interactive behaviors, achieving higher success rates and lower average episode steps than baselines.
Paper Structure (15 sections, 21 equations, 9 figures, 5 tables, 1 algorithm)

This paper contains 15 sections, 21 equations, 9 figures, 5 tables, 1 algorithm.

Figures (9)

  • Figure 1: Overview of IAHRL. Both standard HRL and IAHRL temporally decompose a given task with their hierarchical structures. However, unlike standard HRL, IAHRL allows a high-level policy to interpret the output of low-level policies. Furthermore, the low-level policies in IAHRL generate distinct behaviors rather than single-step actions.
  • Figure 2: Structure of IAHRL. IAHRL takes the localization and perception results as input and outputs a behavior to be transmitted to the control module. The low-level policies implemented with an optimization-based behavior planner imagine safe and structured behaviors for each high-level action. The high-level policy implemented with a new attention mechanism infers interactions with surrounding objects from the behaviors imagined by the low-level policies, and selects the low-level policy that imagines the most interactive behavior.
  • Figure 3: Behavior representation in a Frenet frame. The behaviors imagined by our low-level policies are represented in the Frenet frame, which is defined by the tangential vector $\vec{t}_r$ and the normal vector $\vec{n}_r$.
  • Figure 4: Structure of our attention-based high-level policy. While the Key and Value networks, $W_k$ and $W_v$, take as input the imagined behaviors of our agent and surrounding objects, the Query network, $W_q$, only takes as input the imagined behavior of our agent. This proposed structure allows our agent to be permutation-invariant to the order of surrounding objects, while prioritizing our agent over them. In the left image, the route from the current position to a given goal is denoted as a black dotted line. The orange dashed line denotes the agent's behavior of entering the roundabout, and the blue dashed line denotes the agent's behavior of waiting to yield to surrounding vehicles. Similarly, the predicted behaviors of surrounding vehicles are denoted as green dashed lines.
  • Figure 5: Urban driving tasks introduced in our work. The route to a given goal is represented as a black dotted line. All spawned vehicles are set to ignore traffic signals, so our agent should consider interactions with surrounding vehicles to solve these tasks. Unlike other tasks, spawned vehicles in the lane-change task are initialized with different speeds from 15km/h to 30km/h.
  • ...and 4 more figures