Table of Contents
Fetching ...

Uncertainty-Aware Trajectory Prediction: A Unified Framework Harnessing Positional and Semantic Uncertainties

Jintao Sun, Hu Zhang, Gangyi Ding, Zhedong Zheng

Abstract

Trajectory prediction seeks to forecast the future motion of dynamic entities, such as vehicles and pedestrians, given a temporal horizon of historical movement data and environmental context. A central challenge in this domain is the inherent uncertainty in real-time maps, arising from two primary sources: (1) positional inaccuracies due to sensor limitations or environmental occlusions, and (2) semantic errors stemming from misinterpretations of scene context. To address these challenges, we propose a novel unified framework that jointly models positional and semantic uncertainties and explicitly integrates them into the trajectory prediction pipeline. Our approach employs a dual-head architecture to independently estimate semantic and positional predictions in a dual-pass manner, deriving prediction variances as uncertainty indicators in an end-to-end fashion. These uncertainties are subsequently fused with the semantic and positional predictions to enhance the robustness of trajectory forecasts. We evaluate our uncertainty-aware framework on the nuScenes real-world driving dataset, conducting extensive experiments across four map estimation methods and two trajectory prediction baselines. Results verify that our method (1) effectively quantifies map uncertainties through both positional and semantic dimensions, and (2) consistently improves the performance of existing trajectory prediction models across multiple metrics, including minimum Average Displacement Error (minADE), minimum Final Displacement Error (minFDE), and Miss Rate (MR). Code will available at https://github.com/JT-Sun/UATP.

Uncertainty-Aware Trajectory Prediction: A Unified Framework Harnessing Positional and Semantic Uncertainties

Abstract

Trajectory prediction seeks to forecast the future motion of dynamic entities, such as vehicles and pedestrians, given a temporal horizon of historical movement data and environmental context. A central challenge in this domain is the inherent uncertainty in real-time maps, arising from two primary sources: (1) positional inaccuracies due to sensor limitations or environmental occlusions, and (2) semantic errors stemming from misinterpretations of scene context. To address these challenges, we propose a novel unified framework that jointly models positional and semantic uncertainties and explicitly integrates them into the trajectory prediction pipeline. Our approach employs a dual-head architecture to independently estimate semantic and positional predictions in a dual-pass manner, deriving prediction variances as uncertainty indicators in an end-to-end fashion. These uncertainties are subsequently fused with the semantic and positional predictions to enhance the robustness of trajectory forecasts. We evaluate our uncertainty-aware framework on the nuScenes real-world driving dataset, conducting extensive experiments across four map estimation methods and two trajectory prediction baselines. Results verify that our method (1) effectively quantifies map uncertainties through both positional and semantic dimensions, and (2) consistently improves the performance of existing trajectory prediction models across multiple metrics, including minimum Average Displacement Error (minADE), minimum Final Displacement Error (minFDE), and Miss Rate (MR). Code will available at https://github.com/JT-Sun/UATP.

Paper Structure

This paper contains 15 sections, 15 equations, 7 figures, 4 tables.

Figures (7)

  • Figure 1: Motivation. The $6$ images on the left are captured by $6$ different cameras on the vehicle. The map estimation remains challenging from RGB images, and thus inevitably contains noise, accumulating the error in the trajectory prediction. Comparing the ground-truth HD map (a) and the predicted map in (b), we can see that the error usually occurs in uncertain areas. Therefore, in this work, we intend to leverage two types of uncertainty, i.e., positional uncertainty and semantic uncertainty, to indicate the map errors, mitigating the negative impacts. (c) shows the positional uncertainty for three categories shown in three colors: green for boundary, blue for pedestrian crossing, orange for divider, and red for the ego car. The greater the positional uncertainty of the three categories, the larger the ellipse centered on the map element. (d) shows the semantic uncertainty of our constructed high-definition map, where the purple error band indicates the likelihood of an area being misclassified as another category.
  • Figure 2: Overall pipeline. Firstly, given six images from the vehicle camera, we extract 2D features via a visual backbone and transform them into BEV features. Specifically, we harness the primary and auxiliary heads to predict two BEV features given different-level feature maps. Secondly, during the map estimation stage (bottom), we perform location regression and semantic regression on the features from both primary and auxiliary heads. The resulting primary and auxiliary map element vectors ($\boldsymbol{\mu}$ and $\boldsymbol{\mu'}$ ) are used to calculate KL divergence as positional uncertainty $\boldsymbol{\beta}$. Similarly, we obtain the semantic scores $\boldsymbol{c}$ and $\boldsymbol{c'}$ from the corresponding MLPs, and then we could derive the semantic uncertainty $\boldsymbol{\Delta c}$. Thirdly, we concatenate the high-definition map location information $\boldsymbol{\mu}$, the mean semantic score $\boldsymbol{\bar{c}}$, and their uncertainties $(\boldsymbol{\beta, \Delta c})$, as the input of the downstream model (GNN or Transformer Encoder) to facilitate the scene understanding for trajectory prediction.
  • Figure 3: The left figure shows the effectiveness of our proposed method for estimating high-definition map positional uncertainty and semantic uncertainty in a normal road scenario in the test set. The right figure also shows the effectiveness of our proposed method for estimating high-definition map positional uncertainty and semantic uncertainty in test set scenarios involving curved roads and parking lots. Green denotes road boundaries, blue denotes pedestrian crossings, orange denotes lane dividers, purple indicates category semantic uncertainty, gray denotes lane centerlines, the red vehicle denotes the ego vehicle, and the gray vehicles denote other agents.
  • Figure 4: Top Left: At busy intersections with dense map elements, our proposed uncertainty information improves vehicle trajectory prediction. Top Right: When turning, the camera perspective often fails to capture all surrounding road conditions, potentially causing trajectory predictions to extend beyond the road boundaries. Bottom Left: In complex environments with numerous occlusions, both types of uncertainties in map prediction increase. Bottom Right: When lane information is unclear, the environment is open, and map estimation is poor, our uncertainty information helps maintain high accuracy in trajectory prediction despite incomplete map input.
  • Figure 5: Map visualization of our uncertainty method in the Argoversev2 sensor dataset. In the figure, green represents road boundaries, blue represents pedestrian crossings, orange represents lane dividers, purple indicates category semantic uncertainty, red vehicle denotes the ego vehicle.
  • ...and 2 more figures