Table of Contents
Fetching ...

A Cognitive-Based Trajectory Prediction Approach for Autonomous Driving

Haicheng Liao, Yongkang Li, Zhenning Li, Chengyue Wang, Zhiyong Cui, Shengbo Eben Li, Chengzhong Xu

TL;DR

This work tackles autonomous vehicle trajectory prediction by introducing HLTP, a cognitive-inspired teacher-student framework that imitates human visual attention and memory. The teacher module employs a vision-aware pooling mechanism, a surround-aware encoder, and a multimodal Gaussian decoder to generate multiple plausible maneuvers, while the student learns to predict with limited observations through knowledge distillation modulation. Across NGSiM, HighD, and the novel MoCAD dataset, HLTP demonstrates state-of-the-art accuracy with improved data efficiency and reduced model complexity, and remains robust under missing data scenarios. The approach advances practical AV deployment by offering a lightweight, adaptable predictor that captures human-like perception and decision-making, validated on diverse driving contexts. MoCAD’s right-hand-drive urban setting further enriches evaluation for real-world trajectory prediction tasks.

Abstract

In autonomous vehicle (AV) technology, the ability to accurately predict the movements of surrounding vehicles is paramount for ensuring safety and operational efficiency. Incorporating human decision-making insights enables AVs to more effectively anticipate the potential actions of other vehicles, significantly improving prediction accuracy and responsiveness in dynamic environments. This paper introduces the Human-Like Trajectory Prediction (HLTP) model, which adopts a teacher-student knowledge distillation framework inspired by human cognitive processes. The HLTP model incorporates a sophisticated teacher-student knowledge distillation framework. The "teacher" model, equipped with an adaptive visual sector, mimics the visual processing of the human brain, particularly the functions of the occipital and temporal lobes. The "student" model focuses on real-time interaction and decision-making, drawing parallels to prefrontal and parietal cortex functions. This approach allows for dynamic adaptation to changing driving scenarios, capturing essential perceptual cues for accurate prediction. Evaluated using the Macao Connected and Autonomous Driving (MoCAD) dataset, along with the NGSIM and HighD benchmarks, HLTP demonstrates superior performance compared to existing models, particularly in challenging environments with incomplete data. The project page is available at Github.

A Cognitive-Based Trajectory Prediction Approach for Autonomous Driving

TL;DR

This work tackles autonomous vehicle trajectory prediction by introducing HLTP, a cognitive-inspired teacher-student framework that imitates human visual attention and memory. The teacher module employs a vision-aware pooling mechanism, a surround-aware encoder, and a multimodal Gaussian decoder to generate multiple plausible maneuvers, while the student learns to predict with limited observations through knowledge distillation modulation. Across NGSiM, HighD, and the novel MoCAD dataset, HLTP demonstrates state-of-the-art accuracy with improved data efficiency and reduced model complexity, and remains robust under missing data scenarios. The approach advances practical AV deployment by offering a lightweight, adaptable predictor that captures human-like perception and decision-making, validated on diverse driving contexts. MoCAD’s right-hand-drive urban setting further enriches evaluation for real-world trajectory prediction tasks.

Abstract

In autonomous vehicle (AV) technology, the ability to accurately predict the movements of surrounding vehicles is paramount for ensuring safety and operational efficiency. Incorporating human decision-making insights enables AVs to more effectively anticipate the potential actions of other vehicles, significantly improving prediction accuracy and responsiveness in dynamic environments. This paper introduces the Human-Like Trajectory Prediction (HLTP) model, which adopts a teacher-student knowledge distillation framework inspired by human cognitive processes. The HLTP model incorporates a sophisticated teacher-student knowledge distillation framework. The "teacher" model, equipped with an adaptive visual sector, mimics the visual processing of the human brain, particularly the functions of the occipital and temporal lobes. The "student" model focuses on real-time interaction and decision-making, drawing parallels to prefrontal and parietal cortex functions. This approach allows for dynamic adaptation to changing driving scenarios, capturing essential perceptual cues for accurate prediction. Evaluated using the Macao Connected and Autonomous Driving (MoCAD) dataset, along with the NGSIM and HighD benchmarks, HLTP demonstrates superior performance compared to existing models, particularly in challenging environments with incomplete data. The project page is available at Github.
Paper Structure (34 sections, 16 equations, 5 figures, 13 tables)

This paper contains 34 sections, 16 equations, 5 figures, 13 tables.

Figures (5)

  • Figure 1: llustration of the HLTP Model. The "teacher" model integrates an adaptive visual sector and surround-aware encoder, mirroring the occipital and temporal lobes' roles in visual processing. The "student" model emphasizes real-time interaction and decision-making, akin to the prefrontal and parietal cortex functions. These components collectively enable HLTP to replicate the intricate visual and cognitive tasks of human drivers, thereby enhancing trajectory prediction.
  • Figure 2: Overall "teacher-student" architecture of the HLTP. The Surround-aware Encoder and the Teacher Encoder within the "teacher" model process visual vectors and context matrices to produce surround-aware and visual-aware vectors, respectively. These vectors are then fed into the Teacher Multimodal Decoder, which enables the prediction of different potential maneuvers for the target vehicle, each with associated probabilities. The "student" model acquires knowledge from the "teacher" model using a Knowledge Distillation Modulation (KDM) training strategy.
  • Figure 3: Visualization of the shift-window function in SWA. A window shifting technique is used to capture features by moving windows across key and query vectors, similar to convolution. Overlapping shifted windows, highlighted in red, connect with prior layers, promoting interaction and information sharing between query and key vectors.
  • Figure 4: Autonomous driving testing platform of University of Macau.
  • Figure 5: Multimodal probabilistic prediction (a) and visualizations (b) for the target vehicle on NGSIM. Heat maps illustrate the GMM of predictions: brighter areas denote higher probabilities. The target vehicle is marked in red and its surrounding vehicles in blue.