Table of Contents
Fetching ...

ProIn: Learning to Predict Trajectory Based on Progressive Interactions for Autonomous Driving

Yinke Dong, Haifeng Yuan, Hongkun Liu, Wei Jing, Fangzhen Li, Hongmin Liu, Bin Fan

TL;DR

A progressive interaction network is proposed to enable the agent's feature to progressively focus on relevant maps, in order to better learn agents' feature representation capturing the relevant map constraints.

Abstract

Accurate motion prediction of pedestrians, cyclists, and other surrounding vehicles (all called agents) is very important for autonomous driving. Most existing works capture map information through an one-stage interaction with map by vector-based attention, to provide map constraints for social interaction and multi-modal differentiation. However, these methods have to encode all required map rules into the focal agent's feature, so as to retain all possible intentions' paths while at the meantime to adapt to potential social interaction. In this work, a progressive interaction network is proposed to enable the agent's feature to progressively focus on relevant maps, in order to better learn agents' feature representation capturing the relevant map constraints. The network progressively encode the complex influence of map constraints into the agent's feature through graph convolutions at the following three stages: after historical trajectory encoder, after social interaction, and after multi-modal differentiation. In addition, a weight allocation mechanism is proposed for multi-modal training, so that each mode can obtain learning opportunities from a single-mode ground truth. Experiments have validated the superiority of progressive interactions to the existing one-stage interaction, and demonstrate the effectiveness of each component. Encouraging results were obtained in the challenging benchmarks.

ProIn: Learning to Predict Trajectory Based on Progressive Interactions for Autonomous Driving

TL;DR

A progressive interaction network is proposed to enable the agent's feature to progressively focus on relevant maps, in order to better learn agents' feature representation capturing the relevant map constraints.

Abstract

Accurate motion prediction of pedestrians, cyclists, and other surrounding vehicles (all called agents) is very important for autonomous driving. Most existing works capture map information through an one-stage interaction with map by vector-based attention, to provide map constraints for social interaction and multi-modal differentiation. However, these methods have to encode all required map rules into the focal agent's feature, so as to retain all possible intentions' paths while at the meantime to adapt to potential social interaction. In this work, a progressive interaction network is proposed to enable the agent's feature to progressively focus on relevant maps, in order to better learn agents' feature representation capturing the relevant map constraints. The network progressively encode the complex influence of map constraints into the agent's feature through graph convolutions at the following three stages: after historical trajectory encoder, after social interaction, and after multi-modal differentiation. In addition, a weight allocation mechanism is proposed for multi-modal training, so that each mode can obtain learning opportunities from a single-mode ground truth. Experiments have validated the superiority of progressive interactions to the existing one-stage interaction, and demonstrate the effectiveness of each component. Encouraging results were obtained in the challenging benchmarks.
Paper Structure (15 sections, 8 equations, 5 figures, 5 tables)

This paper contains 15 sections, 8 equations, 5 figures, 5 tables.

Figures (5)

  • Figure 1: (a) and (b) are respectively generated by the one-stage and ours in Table \ref{['tab:ablation study of M2A']}. (a) displays a failure case where an agent (yellow) seems to forget the map due to its social interaction with other agents (blue), resulting in the predicted straight-ahead trajectory (green) that violates the map rules, and another predicted left turn trajectory (green) goes to the wrong lane. (b) presents more reasonable forecasts using our progressive interaction model.
  • Figure 2: The pipeline of our method. It first extracts features of agents and map independently, then uses GCNs to implement a series of interactions between agents and map, and finally uses six branches to generate multi-modal trajectories.
  • Figure 3: Illustration of neighbors in Map-Agent interaction with (a) fixed range and (b) the proposed dynamic range, where $\mathbf{p}(T)+D$ is set as the center of the circle with radius of $|D|+ \delta$.
  • Figure 4: Visualization of the attentions of a focal agent on map nodes at different progressive interaction stages. Larger blue dots indicate higher attention on the map nodes. The last two images in the first row show the changes in attention caused by neighboring trajectories. The images in the second row illustrate the differences in attention for three different modes. The neighbor agents without gt trajectory are static.
  • Figure 5: Qualitative results of the proposed model on the Argoverse1 validation set. Please refer to Figure \ref{['fig:att_sence']} for the meanings of colors and symbols.