Trajectory Planning for Autonomous Vehicle Using Iterative Reward Prediction in Reinforcement Learning

Hyunwoo Park

Trajectory Planning for Autonomous Vehicle Using Iterative Reward Prediction in Reinforcement Learning

Hyunwoo Park

TL;DR

The paper tackles instability and uncertainty in reinforcement learning–based autonomous-vehicle trajectory planning by introducing an iterative reward prediction framework with uncertainty propagation. It combines Reward Prediction (RP), Iterative Reward Prediction (IRP), and Kalman-filter–based uncertainty propagation with Minkowski-sum collision checks to stabilize learning and improve safety. Evaluated in the CARLA simulator across multiple scenarios, the approach yields substantial gains over baselines, including a 60.17% reduction in collisions and a 30.82x increase in average reward, with IRP plus uncertainty propagation performing best. Overall, the method enhances learning stability, safety awareness, and practical viability for robust AV trajectory planning, though challenges remain in very complex scenarios and safety-focused extensions are planned for future work.

Abstract

Traditional trajectory planning methods for autonomous vehicles have several limitations. For example, heuristic and explicit simple rules limit generalizability and hinder complex motions. These limitations can be addressed using reinforcement learning-based trajectory planning. However, reinforcement learning suffers from unstable learning, and existing reinforcement learning-based trajectory planning methods do not consider the uncertainties. Thus, this paper, proposes a reinforcement learning-based trajectory planning method for autonomous vehicles. The proposed method involves an iterative reward prediction approach that iteratively predicts expectations of future states. These predicted states are then used to forecast rewards and integrated into the learning process to enhance stability. Additionally, a method is proposed that utilizes uncertainty propagation to make the reinforcement learning agent aware of uncertainties. The proposed method was evaluated using the CARLA simulator. Compared to the baseline methods, the proposed method reduced the collision rate by 60.17 %, and increased the average reward by 30.82 times. A video of the proposed method is available at https://www.youtube.com/watch?v=PfDbaeLfcN4.

Trajectory Planning for Autonomous Vehicle Using Iterative Reward Prediction in Reinforcement Learning

TL;DR

Abstract

Paper Structure (21 sections, 7 equations, 5 figures, 1 table, 1 algorithm)

This paper contains 21 sections, 7 equations, 5 figures, 1 table, 1 algorithm.

INTRODUCTION
Related Works
RL-based Trajectory Planning
Control Command
Goal for Trajectory
Uncertainty-aware Planning
Method
Problem Formulation
Reward Prediction
Iterative Reward Prediction
Uncertainty Propagation
Overall Algorithm
Experiments
Experiment Configuration
Baseline Methods
...and 6 more sections

Figures (5)

Figure 1: The AV(red car) plans a trajectory(red boxes) along its lane while avoiding a parked car(blue car) on the right and a car that is changing lanes(another blue car) from the left lane. The AV's planned trajectories are represented as red boxes, and their trajectories considering uncertainty, are represented as rounded light red boxes. The other vehicles' predicted trajectories are represented as blue boxes, and their trajectories, considering uncertainty, are represented as rounded light blue boxes. The goal of each AV's trajectory is determined iteratively at each time step from the previously predicted state of the AV and the states of the other vehicles. The AV's states and corresponding goals are represented by red, orange, yellow, green, blue, and purple circles(shown in chronological order). The predicted states of the other vehicles and the iteratively predicted states of the AV are employed for the prediction of reward within the planning horizon, which stabilizes the RL learning process.
Figure 2: The AV(red car) is following the lane while the other traffic participant(blue car) attempts to change lanes. The predicted states of the AV and the other traffic participant are represented as red and blue boxes respectively. The state of the AV at time $t$ and its goal is represented as red circles. The predicted states of the other traffic participant are assumed to be given. The predicted states(red boxes) of the AV are predicted by the RL agent's action $\pi(s_{t,0})$, i.e., the goal of the trajectory.
Figure 3: Demonstration of inaccuracy problem of RP. Here, the agent has planned a trajectory that takes AV close to the reference. The trajectories planned at each time step beginning from each state-action pair are represented by different colors. The return predicted using RP, $Q_{RP}$, is based on the red trajectory planned at time $t$. The agent plans a new trajectory at each step; therefore, the planned trajectory that RP utilized at time $t$(red) and the actual trajectory executed by the AV(blue) are different. Thus, $Q_{RP}$ has an error $\epsilon$ compared to the true return $Q_{true}$, which is the return obtained by following the blue trajectory.
Figure 4: Predicted states(blue boxes) of other traffic participants(blue cars) and predicted states(red boxes) of the AV(red car) and actions of the RL agent. The action of the RL agent is determined iteratively from the previous state of the AV, its goal, and the states of the other vehicles. The goals and the AV's start states are represented as red, orange, yellow, green, blue, and purple circles in chronological order. The predicted states of the other vehicles and the iteratively predicted states of the AV are utilized to predict the rewards during the learning process.
Figure 5: Average reward per time step in an episode, collision rate, and success rate of all methods for all scenarios during training.

Trajectory Planning for Autonomous Vehicle Using Iterative Reward Prediction in Reinforcement Learning

TL;DR

Abstract

Trajectory Planning for Autonomous Vehicle Using Iterative Reward Prediction in Reinforcement Learning

Authors

TL;DR

Abstract

Table of Contents

Figures (5)