Table of Contents
Fetching ...

MSTF: Multiscale Transformer for Incomplete Trajectory Prediction

Zhanwen Liu, Chao Li, Nan Yang, Yang Wang, Jiaqi Ma, Guangliang Cheng, Xiangmo Zhao

TL;DR

The paper tackles incomplete trajectory prediction for autonomous driving by introducing MSTF, a Transformer-based framework that combines a Multiscale Attention Head (MAH) with an Information Increment-based Pattern Adaptive (IIPA) module. MAH captures multi-scale temporal dependencies to mitigate missing data effects, while IIPA derives a continuity-oriented representation that guides predictions toward motion consistency. The approach is validated on HighD and Argoverse datasets, showing improved robustness and accuracy over state-of-the-art methods, with insights into when the method excels or faces challenges in complex urban scenes. This work advances end-to-end incomplete trajectory forecasting, reducing error propagation from missing values and enhancing real-time decision-making for autonomous systems.

Abstract

Motion forecasting plays a pivotal role in autonomous driving systems, enabling vehicles to execute collision warnings and rational local-path planning based on predictions of the surrounding vehicles. However, prevalent methods often assume complete observed trajectories, neglecting the potential impact of missing values induced by object occlusion, scope limitation, and sensor failures. Such oversights inevitably compromise the accuracy of trajectory predictions. To tackle this challenge, we propose an end-to-end framework, termed Multiscale Transformer (MSTF), meticulously crafted for incomplete trajectory prediction. MSTF integrates a Multiscale Attention Head (MAH) and an Information Increment-based Pattern Adaptive (IIPA) module. Specifically, the MAH component concurrently captures multiscale motion representation of trajectory sequence from various temporal granularities, utilizing a multi-head attention mechanism. This approach facilitates the modeling of global dependencies in motion across different scales, thereby mitigating the adverse effects of missing values. Additionally, the IIPA module adaptively extracts continuity representation of motion across time steps by analyzing missing patterns in the data. The continuity representation delineates motion trend at a higher level, guiding MSTF to generate predictions consistent with motion continuity. We evaluate our proposed MSTF model using two large-scale real-world datasets. Experimental results demonstrate that MSTF surpasses state-of-the-art (SOTA) models in the task of incomplete trajectory prediction, showcasing its efficacy in addressing the challenges posed by missing values in motion forecasting for autonomous driving systems.

MSTF: Multiscale Transformer for Incomplete Trajectory Prediction

TL;DR

The paper tackles incomplete trajectory prediction for autonomous driving by introducing MSTF, a Transformer-based framework that combines a Multiscale Attention Head (MAH) with an Information Increment-based Pattern Adaptive (IIPA) module. MAH captures multi-scale temporal dependencies to mitigate missing data effects, while IIPA derives a continuity-oriented representation that guides predictions toward motion consistency. The approach is validated on HighD and Argoverse datasets, showing improved robustness and accuracy over state-of-the-art methods, with insights into when the method excels or faces challenges in complex urban scenes. This work advances end-to-end incomplete trajectory forecasting, reducing error propagation from missing values and enhancing real-time decision-making for autonomous systems.

Abstract

Motion forecasting plays a pivotal role in autonomous driving systems, enabling vehicles to execute collision warnings and rational local-path planning based on predictions of the surrounding vehicles. However, prevalent methods often assume complete observed trajectories, neglecting the potential impact of missing values induced by object occlusion, scope limitation, and sensor failures. Such oversights inevitably compromise the accuracy of trajectory predictions. To tackle this challenge, we propose an end-to-end framework, termed Multiscale Transformer (MSTF), meticulously crafted for incomplete trajectory prediction. MSTF integrates a Multiscale Attention Head (MAH) and an Information Increment-based Pattern Adaptive (IIPA) module. Specifically, the MAH component concurrently captures multiscale motion representation of trajectory sequence from various temporal granularities, utilizing a multi-head attention mechanism. This approach facilitates the modeling of global dependencies in motion across different scales, thereby mitigating the adverse effects of missing values. Additionally, the IIPA module adaptively extracts continuity representation of motion across time steps by analyzing missing patterns in the data. The continuity representation delineates motion trend at a higher level, guiding MSTF to generate predictions consistent with motion continuity. We evaluate our proposed MSTF model using two large-scale real-world datasets. Experimental results demonstrate that MSTF surpasses state-of-the-art (SOTA) models in the task of incomplete trajectory prediction, showcasing its efficacy in addressing the challenges posed by missing values in motion forecasting for autonomous driving systems.
Paper Structure (15 sections, 10 equations, 5 figures, 2 tables)

This paper contains 15 sections, 10 equations, 5 figures, 2 tables.

Figures (5)

  • Figure 1: (a) lists the distribution of missing percentage of trajectory, showing that most of the trajectory have varying proportions of missing values. In the case shown in (b), vehicle 1 and vehicle 2 are occluded by vehicle 3 at time ${t_1}$ and ${t_2}$, respectively, resulting in missing values for their trajectory.
  • Figure 2: Illustration of the proposed MSTF framework. (a) Generate the sequence mask matrix with randomly distributed number and position of masks, which is used to mask the complete trajectory provided by the public dataset to obtain incomplete trajectory. (b) Construct multiscale attention head by predefined padding mask matrix with different temporal granularities for extracting multi-scale motion representation. (c) Perform information incremental analysis based on the sequence mask matrix and the padding mask matrix to obtain continuity representation across time steps. The future trajectory decoder outputs the future trajectory based on the multi-scale motion representation and continuity representation.
  • Figure 3: The computation process for the attention head $i$. Padding mask is the core that determines the temporal scale of the attention head, and different attention heads are identical except for padding mask. In this example, $m_p^i$ is the padding mask matrix for $i = 2$, where the gray squares are 0 and the white ones are 1.
  • Figure 4: (a) shows a real highway scene where the HighD dataset was collected. (b) HD map data provided by the Argoverse dataset, showing the complex road topology where the data was collected.
  • Figure 5: Visualization of predictions for three different maneuvers at different missing rate intervals.