Table of Contents
Fetching ...

MSMA: Multi-agent Trajectory Prediction in Connected and Autonomous Vehicle Environment with Multi-source Data Integration

Xi Chen, Rahul Bhadani, Zhanbo Sun, Larry Head

TL;DR

This work tackles trajectory prediction for surrounding vehicles in a mixed traffic scenario with a central CAV, leveraging both onboard sensors and V2X communications. It introduces MSMA, an encoder-decoder framework that uses source-specific temporal encoders and a cross-attention fusion module, along with Graph Attention Network–based agent-agent and agent-lane interactions, to jointly predict $D$ future trajectories per agent. The model is trained and evaluated on a customized CARLA Town03 dataset with synthesized latency and noise, and shows that multi-source data fusion improves prediction accuracy, particularly at higher CV market penetration rates, as measured by $\text{ADE}$, $\text{FDE}$, and $\text{MR}$. The study highlights MSMA as a first step toward effective multi-source trajectory forecasting in a CAV environment, while acknowledging limitations such as assuming broadcast-only CV trajectories, vehicle homogeneity, and the need for real-world data for broader validation.

Abstract

The prediction of surrounding vehicle trajectories is crucial for collision-free path planning. In this study, we focus on a scenario where a connected and autonomous vehicle (CAV) serves as the central agent, utilizing both sensors and communication technologies to perceive its surrounding traffics consisting of autonomous vehicles (AVs), connected vehicles (CVs), and human-driven vehicles (HDVs). Our trajectory prediction task is aimed at all the detected surrounding vehicles. To effectively integrate the multi-source data from both sensor and communication technologies, we propose a deep learning framework called MSMA utilizing a cross-attention module for multi-source data fusion. Vector map data is utilized to provide contextual information. The trajectory dataset is collected in CARLA simulator with synthesized data errors introduced. Numerical experiments demonstrate that in a mixed traffic flow scenario, the integration of data from different sources enhances our understanding of the environment. This notably improves trajectory prediction accuracy, particularly in situations with a high CV market penetration rate. The code is available at: https://github.com/xichennn/MSMA.

MSMA: Multi-agent Trajectory Prediction in Connected and Autonomous Vehicle Environment with Multi-source Data Integration

TL;DR

This work tackles trajectory prediction for surrounding vehicles in a mixed traffic scenario with a central CAV, leveraging both onboard sensors and V2X communications. It introduces MSMA, an encoder-decoder framework that uses source-specific temporal encoders and a cross-attention fusion module, along with Graph Attention Network–based agent-agent and agent-lane interactions, to jointly predict future trajectories per agent. The model is trained and evaluated on a customized CARLA Town03 dataset with synthesized latency and noise, and shows that multi-source data fusion improves prediction accuracy, particularly at higher CV market penetration rates, as measured by , , and . The study highlights MSMA as a first step toward effective multi-source trajectory forecasting in a CAV environment, while acknowledging limitations such as assuming broadcast-only CV trajectories, vehicle homogeneity, and the need for real-world data for broader validation.

Abstract

The prediction of surrounding vehicle trajectories is crucial for collision-free path planning. In this study, we focus on a scenario where a connected and autonomous vehicle (CAV) serves as the central agent, utilizing both sensors and communication technologies to perceive its surrounding traffics consisting of autonomous vehicles (AVs), connected vehicles (CVs), and human-driven vehicles (HDVs). Our trajectory prediction task is aimed at all the detected surrounding vehicles. To effectively integrate the multi-source data from both sensor and communication technologies, we propose a deep learning framework called MSMA utilizing a cross-attention module for multi-source data fusion. Vector map data is utilized to provide contextual information. The trajectory dataset is collected in CARLA simulator with synthesized data errors introduced. Numerical experiments demonstrate that in a mixed traffic flow scenario, the integration of data from different sources enhances our understanding of the environment. This notably improves trajectory prediction accuracy, particularly in situations with a high CV market penetration rate. The code is available at: https://github.com/xichennn/MSMA.
Paper Structure (23 sections, 8 equations, 8 figures, 2 tables)

This paper contains 23 sections, 8 equations, 8 figures, 2 tables.

Figures (8)

  • Figure 1: Motivational scenarios. AV is in orange. (a) The AV is struck by an oncoming vehicle going straight (in green) when making a left turn, as its view is obstructed by the truck. (b) The AV crashes into the back of the vehicle in front of it as the leading vehicle (in green) brakes abruptly.
  • Figure 2: The mixed traffic flow consists of a CAV as central agent (red square), CVs (green stars), AVs, and HDVs. AVs and HDVs are grouped into NCVs (orange dots). The CAV learn about their surroundings through sensors and communication technology. The communication range (green circle) is typically greater than the sensing range (orange circle). Please note: if there are other CAVs in the surroundings of the central CAV, it will be considered as a CV, as their trajectories are broadcasted to the central CAV through communication.
  • Figure 3: (a) Traffic simulation in CARLA simulator. (b) Map of CARLA Town03
  • Figure 4: Proposed model architecture. Synthesized source-specific errors are introduced to the historical trajectories and then each source is encoded by a source specific temporal encoder. Cross-attention is employed to effectively fuse data from dual sources. Each agent is then modeled as a graph node with the temporal feature being its node feature. GAT is leveraged to capture the agent-agent and agent-lane interaction. The resulting encodings from both GATs are concatenated as input to a multi-agent decoder for predicting future trajectories of all target agents.
  • Figure 5: Ablation study for fusion module. (a) Prediction performance for all the CVs in a scene when different time delays are introduced into their historical trajectories. (b) Prediction performance for all the vehicles within sensing range in a scene when Gaussian noises with different variance are introduced into their historical trajectories.
  • ...and 3 more figures