Table of Contents
Fetching ...

Let Hybrid A* Path Planner Obey Traffic Rules: A Deep Reinforcement Learning-Based Planning Framework

Xibo Li, Shruti Patel, Christof Büskens

TL;DR

The paper addresses how to enable autonomous vehicles to make high-level decisions that obey traffic rules while ensuring feasible local trajectories. It introduces Time-Period Reinforcement Learning (TPRL), a PPO-based hierarchical framework that couples DRL for lane-change decisions with a Hybrid A* trajectory planner, all integrated via the ADTF middleware. Traffic rules are enforced through reward terms derived from Linear Temporal Logic (LTL), guiding the agent to keep in the rightmost lane and perform safe lane changes. The approach is validated both in simulation and on real hardware, demonstrating improved rule compliance, stability, and real-time feasibility compared to baselines like DDQN, with practical implications for safer, scalable autonomous driving.

Abstract

Deep reinforcement learning (DRL) allows a system to interact with its environment and take actions by training an efficient policy that maximizes self-defined rewards. In autonomous driving, it can be used as a strategy for high-level decision making, whereas low-level algorithms such as the hybrid A* path planning have proven their ability to solve the local trajectory planning problem. In this work, we combine these two methods where the DRL makes high-level decisions such as lane change commands. After obtaining the lane change command, the hybrid A* planner is able to generate a collision-free trajectory to be executed by a model predictive controller (MPC). In addition, the DRL algorithm is able to keep the lane change command consistent within a chosen time-period. Traffic rules are implemented using linear temporal logic (LTL), which is then utilized as a reward function in DRL. Furthermore, we validate the proposed method on a real system to demonstrate its feasibility from simulation to implementation on real hardware.

Let Hybrid A* Path Planner Obey Traffic Rules: A Deep Reinforcement Learning-Based Planning Framework

TL;DR

The paper addresses how to enable autonomous vehicles to make high-level decisions that obey traffic rules while ensuring feasible local trajectories. It introduces Time-Period Reinforcement Learning (TPRL), a PPO-based hierarchical framework that couples DRL for lane-change decisions with a Hybrid A* trajectory planner, all integrated via the ADTF middleware. Traffic rules are enforced through reward terms derived from Linear Temporal Logic (LTL), guiding the agent to keep in the rightmost lane and perform safe lane changes. The approach is validated both in simulation and on real hardware, demonstrating improved rule compliance, stability, and real-time feasibility compared to baselines like DDQN, with practical implications for safer, scalable autonomous driving.

Abstract

Deep reinforcement learning (DRL) allows a system to interact with its environment and take actions by training an efficient policy that maximizes self-defined rewards. In autonomous driving, it can be used as a strategy for high-level decision making, whereas low-level algorithms such as the hybrid A* path planning have proven their ability to solve the local trajectory planning problem. In this work, we combine these two methods where the DRL makes high-level decisions such as lane change commands. After obtaining the lane change command, the hybrid A* planner is able to generate a collision-free trajectory to be executed by a model predictive controller (MPC). In addition, the DRL algorithm is able to keep the lane change command consistent within a chosen time-period. Traffic rules are implemented using linear temporal logic (LTL), which is then utilized as a reward function in DRL. Furthermore, we validate the proposed method on a real system to demonstrate its feasibility from simulation to implementation on real hardware.
Paper Structure (15 sections, 7 equations, 10 figures, 4 tables, 1 algorithm)

This paper contains 15 sections, 7 equations, 10 figures, 4 tables, 1 algorithm.

Figures (10)

  • Figure 1: Clockwise from the left: two ADAS model cars driving cooperatively, with the high-level decision being made by RL; real-time trajectory planning in the visualization; and the action distribution of the RL Agent after training.
  • Figure 2: Reinforcement learning pipeline with ADTF framework
  • Figure 3: Hierarchical planning framework combining reinforcement learning with trajectory planning and control
  • Figure 4: Observation space in Frenet Coordinates werling2010optimal
  • Figure 5: Time Period Reinforcement Learning with the hybrid A$^{*}$ path planner
  • ...and 5 more figures