Trajectory Planning for Autonomous Vehicle Using Iterative Reward Prediction in Reinforcement Learning
Hyunwoo Park
TL;DR
The paper tackles instability and uncertainty in reinforcement learning–based autonomous-vehicle trajectory planning by introducing an iterative reward prediction framework with uncertainty propagation. It combines Reward Prediction (RP), Iterative Reward Prediction (IRP), and Kalman-filter–based uncertainty propagation with Minkowski-sum collision checks to stabilize learning and improve safety. Evaluated in the CARLA simulator across multiple scenarios, the approach yields substantial gains over baselines, including a 60.17% reduction in collisions and a 30.82x increase in average reward, with IRP plus uncertainty propagation performing best. Overall, the method enhances learning stability, safety awareness, and practical viability for robust AV trajectory planning, though challenges remain in very complex scenarios and safety-focused extensions are planned for future work.
Abstract
Traditional trajectory planning methods for autonomous vehicles have several limitations. For example, heuristic and explicit simple rules limit generalizability and hinder complex motions. These limitations can be addressed using reinforcement learning-based trajectory planning. However, reinforcement learning suffers from unstable learning, and existing reinforcement learning-based trajectory planning methods do not consider the uncertainties. Thus, this paper, proposes a reinforcement learning-based trajectory planning method for autonomous vehicles. The proposed method involves an iterative reward prediction approach that iteratively predicts expectations of future states. These predicted states are then used to forecast rewards and integrated into the learning process to enhance stability. Additionally, a method is proposed that utilizes uncertainty propagation to make the reinforcement learning agent aware of uncertainties. The proposed method was evaluated using the CARLA simulator. Compared to the baseline methods, the proposed method reduced the collision rate by 60.17 %, and increased the average reward by 30.82 times. A video of the proposed method is available at https://www.youtube.com/watch?v=PfDbaeLfcN4.
