Trajectory Mamba: Efficient Attention-Mamba Forecasting Model Based on Selective SSM

Yizhou Huang; Yihua Cheng; Kezhi Wang

Trajectory Mamba: Efficient Attention-Mamba Forecasting Model Based on Selective SSM

Yizhou Huang, Yihua Cheng, Kezhi Wang

TL;DR

Trajectory Mamba introduces a selective state-space model to replace conventional self-attention in a three-encoder–decoder framework for motion prediction. By coupling joint polyline encoding with Cross-Tamba decoding and an RNN-based trajectory weighting module, the approach achieves near-linear efficiency while maintaining strong accuracy. Key contributions include the selective input-dependent SSM attention, the joint encoding of pedestrians and traffic signals, and a cross-state-space decoder that shares a unified scene representation across targets. Empirical results on Argoverse 1 and 2 show a four-fold FLOPs reduction and over 40% fewer parameters, with competitive or superior accuracy compared to prior SOTA methods, highlighting strong potential for real-time autonomous driving deployment.

Abstract

Motion prediction is crucial for autonomous driving, as it enables accurate forecasting of future vehicle trajectories based on historical inputs. This paper introduces Trajectory Mamba, a novel efficient trajectory prediction framework based on the selective state-space model (SSM). Conventional attention-based models face the challenge of computational costs that grow quadratically with the number of targets, hindering their application in highly dynamic environments. In response, we leverage the SSM to redesign the self-attention mechanism in the encoder-decoder architecture, thereby achieving linear time complexity. To address the potential reduction in prediction accuracy resulting from modifications to the attention mechanism, we propose a joint polyline encoding strategy to better capture the associations between static and dynamic contexts, ultimately enhancing prediction accuracy. Additionally, to balance prediction accuracy and inference speed, we adopted the decoder that differs entirely from the encoder. Through cross-state space attention, all target agents share the scene context, allowing the SSM to interact with the shared scene representation during decoding, thus inferring different trajectories over the next prediction steps. Our model achieves state-of-the-art results in terms of inference speed and parameter efficiency on both the Argoverse 1 and Argoverse 2 datasets. It demonstrates a four-fold reduction in FLOPs compared to existing methods and reduces parameter count by over 40% while surpassing the performance of the vast majority of previous methods. These findings validate the effectiveness of Trajectory Mamba in trajectory prediction tasks.

Trajectory Mamba: Efficient Attention-Mamba Forecasting Model Based on Selective SSM

TL;DR

Abstract

Trajectory Mamba: Efficient Attention-Mamba Forecasting Model Based on Selective SSM

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (4)