Table of Contents
Fetching ...

SemanticFormer: Holistic and Semantic Traffic Scene Representation for Trajectory Prediction using Knowledge Graphs

Zhigang Sun, Zixu Wang, Lavdim Halilaj, Juergen Luettin

TL;DR

SemanticFormer addresses autonomous driving trajectory prediction by modeling a holistic scene with a knowledge graph that encodes static map topology, dynamic agents, and semantic relations. It introduces a hierarchical heterogeneous scene graph encoder, meta-path based reasoning, an agent-motion and lane encoder, and a Laplacian Mixture Density Network-based decoder, followed by a refinement step using anchor paths and speed profiles. On nuScenes, it reports competitive ADE_K and FDE_K metrics, with the KG-augmented variants improving baseline methods by about 5% and 4% respectively when integrated into VectorNet and LaFormer, demonstrating practical portability. The work highlights the importance of rich semantic context for multimodal trajectory prediction and shows the approach can be extended to incorporate additional knowledge such as traffic rules and signs, enabling broader real-world deployment.

Abstract

Trajectory prediction in autonomous driving relies on accurate representation of all relevant contexts of the driving scene, including traffic participants, road topology, traffic signs, as well as their semantic relations to each other. Despite increased attention to this issue, most approaches in trajectory prediction do not consider all of these factors sufficiently. We present SemanticFormer, an approach for predicting multimodal trajectories by reasoning over a semantic traffic scene graph using a hybrid approach. It utilizes high-level information in the form of meta-paths, i.e. trajectories on which an agent is allowed to drive from a knowledge graph which is then processed by a novel pipeline based on multiple attention mechanisms to predict accurate trajectories. SemanticFormer comprises a hierarchical heterogeneous graph encoder to capture spatio-temporal and relational information across agents as well as between agents and road elements. Further, it includes a predictor to fuse different encodings and decode trajectories with probabilities. Finally, a refinement module assesses permitted meta-paths of trajectories and speed profiles to obtain final predicted trajectories. Evaluation of the nuScenes benchmark demonstrates improved performance compared to several SOTA methods. In addition, we demonstrate that our knowledge graph can be easily added to two graph-based existing SOTA methods, namely VectorNet and Laformer, replacing their original homogeneous graphs. The evaluation results suggest that by adding our knowledge graph the performance of the original methods is enhanced by 5% and 4%, respectively.

SemanticFormer: Holistic and Semantic Traffic Scene Representation for Trajectory Prediction using Knowledge Graphs

TL;DR

SemanticFormer addresses autonomous driving trajectory prediction by modeling a holistic scene with a knowledge graph that encodes static map topology, dynamic agents, and semantic relations. It introduces a hierarchical heterogeneous scene graph encoder, meta-path based reasoning, an agent-motion and lane encoder, and a Laplacian Mixture Density Network-based decoder, followed by a refinement step using anchor paths and speed profiles. On nuScenes, it reports competitive ADE_K and FDE_K metrics, with the KG-augmented variants improving baseline methods by about 5% and 4% respectively when integrated into VectorNet and LaFormer, demonstrating practical portability. The work highlights the importance of rich semantic context for multimodal trajectory prediction and shows the approach can be extended to incorporate additional knowledge such as traffic rules and signs, enabling broader real-world deployment.

Abstract

Trajectory prediction in autonomous driving relies on accurate representation of all relevant contexts of the driving scene, including traffic participants, road topology, traffic signs, as well as their semantic relations to each other. Despite increased attention to this issue, most approaches in trajectory prediction do not consider all of these factors sufficiently. We present SemanticFormer, an approach for predicting multimodal trajectories by reasoning over a semantic traffic scene graph using a hybrid approach. It utilizes high-level information in the form of meta-paths, i.e. trajectories on which an agent is allowed to drive from a knowledge graph which is then processed by a novel pipeline based on multiple attention mechanisms to predict accurate trajectories. SemanticFormer comprises a hierarchical heterogeneous graph encoder to capture spatio-temporal and relational information across agents as well as between agents and road elements. Further, it includes a predictor to fuse different encodings and decode trajectories with probabilities. Finally, a refinement module assesses permitted meta-paths of trajectories and speed profiles to obtain final predicted trajectories. Evaluation of the nuScenes benchmark demonstrates improved performance compared to several SOTA methods. In addition, we demonstrate that our knowledge graph can be easily added to two graph-based existing SOTA methods, namely VectorNet and Laformer, replacing their original homogeneous graphs. The evaluation results suggest that by adding our knowledge graph the performance of the original methods is enhanced by 5% and 4%, respectively.
Paper Structure (23 sections, 13 equations, 7 figures, 7 tables, 2 algorithms)

This paper contains 23 sections, 13 equations, 7 figures, 7 tables, 2 algorithms.

Figures (7)

  • Figure 1: Driving scenes represented in a heterogeneous graph capturing all relevant map details, traffic agents, and their semantic relationships.
  • Figure 2: SemanticFormer Overview: Data Representation models the static map information and dynamic agents interaction by a holistic knowledge graph. Scene Graph Encoder extracts meta-paths and generates holistic latent representation for agents and lanes. Probability Predictor fuses the encodings and outputs trajectory candidates. Prediction Refinement uses anchor paths and speed profiles to evaluate trajectories and generates final predictions.
  • Figure 3: Illustration of traffic scene ontologies iccvw_leon: Agent Ontology defines agent attributes like category, speed, position, and trajectory, and relationships to map like distance to lane, and path distance. Map Ontology defines map elements like lane snippet, lane slice, traffic light, etc., and relations within map elements like left/right lane, switch via double dashed line.
  • Figure 4: (a) Illustration of meta-paths depicting permitted trajectories. (b) Illustration of the participant interaction graph: Characterized by edge types: Longitudinal(green), Intersecting(gray), Lateral(red), and Pedestrian(yellow).
  • Figure 5: Illustration of the agent motion and lane encoder: GNN and GRU extracts spatio-temporal information, attention mechanism models participants related lane.
  • ...and 2 more figures