Table of Contents
Fetching ...

V2X-RECT: An Efficient V2X Trajectory Prediction Framework via Redundant Interaction Filtering and Tracking Error Correction

Xiangyan Kong, Xuecheng Wu, Xiongwei Zhao, Xiaodong Li, Yunyun Shi, Gang Wang, Dingkang Yang, Yang Liu, Hong Chen, Yulong Gao

TL;DR

This work designs a multi-source identity matching and correction module that leverages multi-view spatiotemporal relationships to achieve stable and consistent target association, mitigating the adverse effects of mismatches on trajectory encoding and cross-view feature fusion.

Abstract

V2X prediction can alleviate perception incompleteness caused by limited line of sight through fusing trajectory data from infrastructure and vehicles, which is crucial to traffic safety and efficiency. However, in dense traffic scenarios, frequent identity switching of targets hinders cross-view association and fusion. Meanwhile, multi-source information tends to generate redundant interactions during the encoding stage, and traditional vehicle-centric encoding leads to large amounts of repetitive historical trajectory feature encoding, degrading real-time inference performance. To address these challenges, we propose V2X-RECT, a trajectory prediction framework designed for high-density environments. It enhances data association consistency, reduces redundant interactions, and reuses historical information to enable more efficient and accurate prediction. Specifically, we design a multi-source identity matching and correction module that leverages multi-view spatiotemporal relationships to achieve stable and consistent target association, mitigating the adverse effects of mismatches on trajectory encoding and cross-view feature fusion. Then we introduce traffic signal-guided interaction module, encoding trend of traffic light changes as features and exploiting their role in constraining spatiotemporal passage rights to accurately filter key interacting vehicles, while capturing the dynamic impact of signal changes on interaction patterns. Furthermore, a local spatiotemporal coordinate encoding enables reusable features of historical trajectories and map, supporting parallel decoding and significantly improving inference efficiency. Extensive experimental results across V2X-Seq and V2X-Traj datasets demonstrate that our V2X-RECT achieves significant improvements compared to SOTA methods, while also enhancing robustness and inference efficiency across diverse traffic densities.

V2X-RECT: An Efficient V2X Trajectory Prediction Framework via Redundant Interaction Filtering and Tracking Error Correction

TL;DR

This work designs a multi-source identity matching and correction module that leverages multi-view spatiotemporal relationships to achieve stable and consistent target association, mitigating the adverse effects of mismatches on trajectory encoding and cross-view feature fusion.

Abstract

V2X prediction can alleviate perception incompleteness caused by limited line of sight through fusing trajectory data from infrastructure and vehicles, which is crucial to traffic safety and efficiency. However, in dense traffic scenarios, frequent identity switching of targets hinders cross-view association and fusion. Meanwhile, multi-source information tends to generate redundant interactions during the encoding stage, and traditional vehicle-centric encoding leads to large amounts of repetitive historical trajectory feature encoding, degrading real-time inference performance. To address these challenges, we propose V2X-RECT, a trajectory prediction framework designed for high-density environments. It enhances data association consistency, reduces redundant interactions, and reuses historical information to enable more efficient and accurate prediction. Specifically, we design a multi-source identity matching and correction module that leverages multi-view spatiotemporal relationships to achieve stable and consistent target association, mitigating the adverse effects of mismatches on trajectory encoding and cross-view feature fusion. Then we introduce traffic signal-guided interaction module, encoding trend of traffic light changes as features and exploiting their role in constraining spatiotemporal passage rights to accurately filter key interacting vehicles, while capturing the dynamic impact of signal changes on interaction patterns. Furthermore, a local spatiotemporal coordinate encoding enables reusable features of historical trajectories and map, supporting parallel decoding and significantly improving inference efficiency. Extensive experimental results across V2X-Seq and V2X-Traj datasets demonstrate that our V2X-RECT achieves significant improvements compared to SOTA methods, while also enhancing robustness and inference efficiency across diverse traffic densities.

Paper Structure

This paper contains 37 sections, 28 equations, 4 figures, 10 tables, 1 algorithm.

Figures (4)

  • Figure 1: The overall framework of V2X-RECT is a multi-view trajectory prediction model designed for cooperative driving environments, particularly suited to high-density traffic scenarios. The framework integrates five key mechanisms: (a) an identity matching and correction module to enhance data quality and optimize the V2X trajectory fusion; (b) a feature reuse-based encoding module to improve encoding efficiency and enable parallel computation; (c) a traffic signal-guided behavior-interaction module for modeling the impact of traffic signals on agent behavior and interactions; (d) a matching-based multi-view trajectory fusion module, which improve cross-view trajectory consistency and accuracy; (e) a decoding module that utilizes a recurrent, anchor-free proposal generator to produce adaptive trajectory anchors, and refine these initial proposals based on anchors.
  • Figure 2: Qualitative results of V2X-RECT in turning scenarios with varying traffic densities from the V2X-Seq dataset. The target vehicle is shown in orange, the predicted vehicles are described as green and other vehicles are depicted in gray. Non-motorized road users are represented as gray dots. Ground-truth trajectories are drawn in black, and predicted trajectories are shown in various colors to distinguish different prediction modes.
  • Figure 3: Qualitative results of V2X-RECT in lane-changing scenarios with varying traffic densities from the V2X-Seq dataset. The target vehicle is shown in orange, the predicted vehicles are described as green and other vehicles are depicted in gray. Non-motorized road users are represented as gray dots. Ground-truth trajectories are drawn in black, and predicted trajectories are shown in various colors to distinguish different prediction modes.
  • Figure 4: Qualitative results of V2X-RECT in going straight scenarios with varying traffic densities from the V2X-Seq dataset. The target vehicle is shown in orange, the predicted vehicles are described as green and other vehicles are depicted in gray. Non-motorized road users are represented as gray dots. Ground-truth trajectories are drawn in black, and predicted trajectories are shown in various colors to distinguish different prediction modes.