Table of Contents
Fetching ...

Spatio-Temporal Graph Dual-Attention Network for Multi-Agent Prediction and Tracking

Jiachen Li, Hengbo Ma, Zhihao Zhang, Jinning Li, Masayoshi Tomizuka

TL;DR

A generic generative neural system (called STG-DAT) for multi-agent trajectory prediction involving heterogeneous agents takes a step forward to explicit interaction modeling by incorporating relational inductive biases with a dynamic graph representation and leverages both trajectory and scene context information.

Abstract

An effective understanding of the environment and accurate trajectory prediction of surrounding dynamic obstacles are indispensable for intelligent mobile systems (e.g. autonomous vehicles and social robots) to achieve safe and high-quality planning when they navigate in highly interactive and crowded scenarios. Due to the existence of frequent interactions and uncertainty in the scene evolution, it is desired for the prediction system to enable relational reasoning on different entities and provide a distribution of future trajectories for each agent. In this paper, we propose a generic generative neural system (called STG-DAT) for multi-agent trajectory prediction involving heterogeneous agents. The system takes a step forward to explicit interaction modeling by incorporating relational inductive biases with a dynamic graph representation and leverages both trajectory and scene context information. We also employ an efficient kinematic constraint layer applied to vehicle trajectory prediction. The constraint not only ensures physical feasibility but also enhances model performance. Moreover, the proposed prediction model can be easily adopted by multi-target tracking frameworks. The tracking accuracy proves to be improved by empirical results. The proposed system is evaluated on three public benchmark datasets for trajectory prediction, where the agents cover pedestrians, cyclists and on-road vehicles. The experimental results demonstrate that our model achieves better performance than various baseline approaches in terms of prediction and tracking accuracy.

Spatio-Temporal Graph Dual-Attention Network for Multi-Agent Prediction and Tracking

TL;DR

A generic generative neural system (called STG-DAT) for multi-agent trajectory prediction involving heterogeneous agents takes a step forward to explicit interaction modeling by incorporating relational inductive biases with a dynamic graph representation and leverages both trajectory and scene context information.

Abstract

An effective understanding of the environment and accurate trajectory prediction of surrounding dynamic obstacles are indispensable for intelligent mobile systems (e.g. autonomous vehicles and social robots) to achieve safe and high-quality planning when they navigate in highly interactive and crowded scenarios. Due to the existence of frequent interactions and uncertainty in the scene evolution, it is desired for the prediction system to enable relational reasoning on different entities and provide a distribution of future trajectories for each agent. In this paper, we propose a generic generative neural system (called STG-DAT) for multi-agent trajectory prediction involving heterogeneous agents. The system takes a step forward to explicit interaction modeling by incorporating relational inductive biases with a dynamic graph representation and leverages both trajectory and scene context information. We also employ an efficient kinematic constraint layer applied to vehicle trajectory prediction. The constraint not only ensures physical feasibility but also enhances model performance. Moreover, the proposed prediction model can be easily adopted by multi-target tracking frameworks. The tracking accuracy proves to be improved by empirical results. The proposed system is evaluated on three public benchmark datasets for trajectory prediction, where the agents cover pedestrians, cyclists and on-road vehicles. The experimental results demonstrate that our model achieves better performance than various baseline approaches in terms of prediction and tracking accuracy.

Paper Structure

This paper contains 30 sections, 26 equations, 7 figures, 4 tables.

Figures (7)

  • Figure 1: Typical traffic scenarios with large uncertainty and interactions among multiple entities. The left column is adopted from interactiondataset. The upper figure in the first column was captured in a highway ramp merging scenario, where lane change behavior with negotiation happens frequently. The lower figure was captured in a roundabout and an unsignalized intersection scenario, where yielding and stopping behaviors happen frequently. The other two columns shows the occupancy density maps and the velocity fields of the scenarios, which are generated based on the training data to provide statistical context information.
  • Figure 2: The detailed architecture of STG-DAT, which consists of three key components: (a) A deep feature extractor which extracts state, relation and context features from the trajectories of agents, the sequences of occupancy density maps and velocity fields. The red dashed lines indicate sharing parameters. (b) An encoder which includes a graph dual-attention network that processes spatio-temporal graphs and generates abstract node attributes containing interaction information, and an encoding function which maps the node attributes to a latent space. During the testing phase, the encoding function is not used. (c) A decoder which samples future trajectory hypotheses satisfying physical constraints for each agent. The bottom portion of the figure presents some details of (a)-(c). $||$ denotes the concatenation operation. MLP refers to multi-layer perceptron. CNN refers to convolutional neural network.
  • Figure 3: The diagram of the kinematic bicycle model adopted from kong2015kinematic. The model equations are provided in Eq. (\ref{['eq:discretesystem']}).
  • Figure 4: The diagram of the recurrent decoder with kinematic constraint layer. The recurrent process consists of two phases: burn-in phase and prediction phase. In the burn-in phase, the history groundtruth is used as the input of GRU at each step for initialization purpose. In the prediction phase, the output position at the last step will serve as the input of the next step. The iteration continues until the prediction horizon is reached.
  • Figure 5: Qualitative results on the SDD dataset. The green mask represents the predicted distribution and the yellow, blue and red lines represent historical observation, groundtruth and a trajectory hypothesis sampled from the distribution with the smallest error, respectively.
  • ...and 2 more figures