Table of Contents
Fetching ...

A Generalizable Physics-guided Causal Model for Trajectory Prediction in Autonomous Driving

Zhenyu Zong, Yuchen Wang, Haohong Lin, Lu Gan, Huajie Shao

TL;DR

A novel generalizable Physics-guided Causal Model (PCM), which comprises two core components: a Disentangled Scene Encoder, which adopts intervention-based disentanglement to extract domain-invariant features from scenes, and a CausalODE Decoder, which employs a causal attention mechanism to effectively integrate kinematic models with meaningful contextual information.

Abstract

Trajectory prediction for traffic agents is critical for safe autonomous driving. However, achieving effective zero-shot generalization in previously unseen domains remains a significant challenge. Motivated by the consistent nature of kinematics across diverse domains, we aim to incorporate domain-invariant knowledge to enhance zero-shot trajectory prediction capabilities. The key challenges include: 1) effectively extracting domain-invariant scene representations, and 2) integrating invariant features with kinematic models to enable generalized predictions. To address these challenges, we propose a novel generalizable Physics-guided Causal Model (PCM), which comprises two core components: a Disentangled Scene Encoder, which adopts intervention-based disentanglement to extract domain-invariant features from scenes, and a CausalODE Decoder, which employs a causal attention mechanism to effectively integrate kinematic models with meaningful contextual information. Extensive experiments on real-world autonomous driving datasets demonstrate our method's superior zero-shot generalization performance in unseen cities, significantly outperforming competitive baselines. The source code is released at https://github.com/ZY-Zong/Physics-guided-Causal-Model.

A Generalizable Physics-guided Causal Model for Trajectory Prediction in Autonomous Driving

TL;DR

A novel generalizable Physics-guided Causal Model (PCM), which comprises two core components: a Disentangled Scene Encoder, which adopts intervention-based disentanglement to extract domain-invariant features from scenes, and a CausalODE Decoder, which employs a causal attention mechanism to effectively integrate kinematic models with meaningful contextual information.

Abstract

Trajectory prediction for traffic agents is critical for safe autonomous driving. However, achieving effective zero-shot generalization in previously unseen domains remains a significant challenge. Motivated by the consistent nature of kinematics across diverse domains, we aim to incorporate domain-invariant knowledge to enhance zero-shot trajectory prediction capabilities. The key challenges include: 1) effectively extracting domain-invariant scene representations, and 2) integrating invariant features with kinematic models to enable generalized predictions. To address these challenges, we propose a novel generalizable Physics-guided Causal Model (PCM), which comprises two core components: a Disentangled Scene Encoder, which adopts intervention-based disentanglement to extract domain-invariant features from scenes, and a CausalODE Decoder, which employs a causal attention mechanism to effectively integrate kinematic models with meaningful contextual information. Extensive experiments on real-world autonomous driving datasets demonstrate our method's superior zero-shot generalization performance in unseen cities, significantly outperforming competitive baselines. The source code is released at https://github.com/ZY-Zong/Physics-guided-Causal-Model.
Paper Structure (18 sections, 7 equations, 4 figures, 5 tables)

This paper contains 18 sections, 7 equations, 4 figures, 5 tables.

Figures (4)

  • Figure 1: Illustration of two right turn scenarios in two different cities. The model trained on city A can make correct prediction, but fail in unseen city B with different lane width, traffic condition and road alignment. To solve this problem, our method leverages two generalizable knowledge: domain-invariant knowledge separated from the spatial-temporal features and vehicle kinematics, to enhance domain generalization ability.
  • Figure 2: Overall framework of the proposed method. It comprises two main parts: (a) a Disentangled Scene Encoder aiming to extract domain-invariant features; (b) a CausalODE decoder that integrates domain-invariant features with vehicle kinematics learned by neural ODE.
  • Figure 3: The trajectory prediction visualization for top-three methods on 8 different nuScenes scenarios. (a) Our method drives normally, while others crash into pedestrians. (b) Our method stops before the pavement in the crossroad with heavy traffic, while two other predictions crash into other vehicles. (c) Our method turns right smoothly, while G2LTraj turns left. G2LTraj and Wayformer's predictions cannot follow vehicle kinematics well in this scenario. (d) Our method turns left without driving out of the road. (e) Our method correctly turns left at the crossroad. (f) Our method drives normally with vehicles parking aside the road, while other methods stop. (g) Our method stops to avoid collision to the pedestrian ahead. (h) Our method turns right smoothly, indicating it follows vehicle kinematics.
  • Figure 4: Parameter sensitivity of $\lambda$.