Table of Contents
Fetching ...

Uncertainty-Aware DRL for Autonomous Vehicle Crowd Navigation in Shared Space

Mahsa Golchoubian, Moojan Ghafurian, Kerstin Dautenhahn, Nasser Lashgarian Azad

TL;DR

The paper tackles uncertainty in pedestrian trajectories for autonomous vehicle crowd navigation by integrating a data-driven, uncertainty-aware pedestrian predictor with a model-free DRL planner. A novel uncertainty-aware reward and a realism-based Hamburg-derived simulation environment enable training that accounts for prediction covariance, resulting in safer, more human-like navigation. Quantitative results show a 40% reduction in collisions and a 15% improvement in minimum intrusion distance over uncertainty-agnostic baselines, with the proposed method outperforming MPC in both safety and computational efficiency. This work advances practical, real-time AV crowd navigation by explicitly leveraging prediction uncertainty during training and planning, bringing trajectories closer to human driving in shared spaces.

Abstract

Safe, socially compliant, and efficient navigation of low-speed autonomous vehicles (AVs) in pedestrian-rich environments necessitates considering pedestrians' future positions and interactions with the vehicle and others. Despite the inevitable uncertainties associated with pedestrians' predicted trajectories due to their unobserved states (e.g., intent), existing deep reinforcement learning (DRL) algorithms for crowd navigation often neglect these uncertainties when using predicted trajectories to guide policy learning. This omission limits the usability of predictions when diverging from ground truth. This work introduces an integrated prediction and planning approach that incorporates the uncertainties of predicted pedestrian states in the training of a model-free DRL algorithm. A novel reward function encourages the AV to respect pedestrians' personal space, decrease speed during close approaches, and minimize the collision probability with their predicted paths. Unlike previous DRL methods, our model, designed for AV operation in crowded spaces, is trained in a novel simulation environment that reflects realistic pedestrian behaviour in a shared space with vehicles. Results show a 40% decrease in collision rate and a 15% increase in minimum distance to pedestrians compared to the state of the art model that does not account for prediction uncertainty. Additionally, the approach outperforms model predictive control methods that incorporate the same prediction uncertainties in terms of both performance and computational time, while producing trajectories closer to human drivers in similar scenarios.

Uncertainty-Aware DRL for Autonomous Vehicle Crowd Navigation in Shared Space

TL;DR

The paper tackles uncertainty in pedestrian trajectories for autonomous vehicle crowd navigation by integrating a data-driven, uncertainty-aware pedestrian predictor with a model-free DRL planner. A novel uncertainty-aware reward and a realism-based Hamburg-derived simulation environment enable training that accounts for prediction covariance, resulting in safer, more human-like navigation. Quantitative results show a 40% reduction in collisions and a 15% improvement in minimum intrusion distance over uncertainty-agnostic baselines, with the proposed method outperforming MPC in both safety and computational efficiency. This work advances practical, real-time AV crowd navigation by explicitly leveraging prediction uncertainty during training and planning, bringing trajectories closer to human driving in shared spaces.

Abstract

Safe, socially compliant, and efficient navigation of low-speed autonomous vehicles (AVs) in pedestrian-rich environments necessitates considering pedestrians' future positions and interactions with the vehicle and others. Despite the inevitable uncertainties associated with pedestrians' predicted trajectories due to their unobserved states (e.g., intent), existing deep reinforcement learning (DRL) algorithms for crowd navigation often neglect these uncertainties when using predicted trajectories to guide policy learning. This omission limits the usability of predictions when diverging from ground truth. This work introduces an integrated prediction and planning approach that incorporates the uncertainties of predicted pedestrian states in the training of a model-free DRL algorithm. A novel reward function encourages the AV to respect pedestrians' personal space, decrease speed during close approaches, and minimize the collision probability with their predicted paths. Unlike previous DRL methods, our model, designed for AV operation in crowded spaces, is trained in a novel simulation environment that reflects realistic pedestrian behaviour in a shared space with vehicles. Results show a 40% decrease in collision rate and a 15% increase in minimum distance to pedestrians compared to the state of the art model that does not account for prediction uncertainty. Additionally, the approach outperforms model predictive control methods that incorporate the same prediction uncertainties in terms of both performance and computational time, while producing trajectories closer to human drivers in similar scenarios.
Paper Structure (26 sections, 15 equations, 7 figures, 4 tables)

This paper contains 26 sections, 15 equations, 7 figures, 4 tables.

Figures (7)

  • Figure 1: We propose the use of predicted position and its associated covariance in the training of a deep reinforcement learning model with a novel reward function. Through this uncertainty-aware coupled prediction and planning approach, our model learns safe, foresighted, and smooth navigation behaviours among pedestrians in a shared space.
  • Figure 2: Overview of the proposed framework. At each time step, current observations are input to the predictor, generating pedestrians' predicted positions and associated uncertainties over a horizon. These predictions inform collision probability calculations between the AV and each pedestrian's predicted path, contributing to the reward function of the DRL module. Utilizing this predictive data, the DRL motion planner is trained to generate a probability distribution of optimal actions, mitigating collision risk with both current and intended pedestrian paths.
  • Figure 3: Our integrated prediction and planning framework. The DRL network architecture is based on the model introduced in liu2023intention, with novel additions and modifications highlighted in orange. The inclusion of prediction uncertainties (covariance) is a key feature, integrated both as part of the observation state and in the formulation of our novel reward function. This covariance is obtained from the predictor module, which addresses pedestrian-vehicle interaction by incorporating the AV's state and an estimate of its future path as input. In our model, prediction covariance, alongside the states themselves, contributes to the human-human attention module.
  • Figure 4: Predicted trajectories of (a) the polar collision grid (PCG) model trained with only the negative log likelihood loss, compared to (b) the same model trained with an additional uncertainty loss, referred to as the Uncertainty-Aware PCG (UAW-PCG). The blue circles represent the $1\sigma$ standard deviation of the predicted distribution for the future position around the estimated mean (depicted in red). Agents positions at the current time steps are marked with a star, and larger markers along the vehicle's trajectory highlight positions at later times. Here "Ped" and "Veh" stand for pedestrian and vehicle respectively.
  • Figure 5: Comparison of various model trajectories in identical test scenarios. The figure illustrates an autonomous vehicle (in black) navigating through a crowd of pedestrians (in red) towards a goal indicated by a star. It depicts the trajectories of both the AV and the pedestrians for the past 6 frames. A dashed yellow line encircles the current position of each pedestrian, representing their personal space. For models incorporating predictions and uncertainty, the mean trajectory and 1$\sigma$ standard deviation of the pedestrians' future paths over the next 6 frames are depicted by yellow and blue ellipses, respectively. The AV's maximum sensor detection range is shown as a dashed black circle around its current position, with predictions being made only for pedestrians within this range. The detailed progress of each model in this scenario can be found in the attached video.
  • ...and 2 more figures