Table of Contents
Fetching ...

Soft Attention: Does it Actually Help to Learn Social Interactions in Pedestrian Trajectory Prediction?

Laurent Boucaud, Daniel Aloise, Nicolas Saunier

TL;DR

This paper focuses on the deep-learning models with a soft-attention mechanism for social interaction modeling and study whether they use social information at prediction time, and demonstrates that the soft-Attention mechanism and therefore the social information are ignored by the models.

Abstract

We consider the problem of predicting the future path of a pedestrian using its motion history and the motion history of the surrounding pedestrians, called social information. Since the seminal paper on Social-LSTM, deep-learning has become the main tool used to model the impact of social interactions on a pedestrian's motion. The demonstration that these models can learn social interactions relies on an ablative study of these models. The models are compared with and without their social interactions module on two standard metrics, the Average Displacement Error and Final Displacement Error. Yet, these complex models were recently outperformed by a simple constant-velocity approach. This questions if they actually allow to model social interactions as well as the validity of the proof. In this paper, we focus on the deep-learning models with a soft-attention mechanism for social interaction modeling and study whether they use social information at prediction time. We conduct two experiments across four state-of-the-art approaches on the ETH and UCY datasets, which were also used in previous work. First, the models are trained by replacing the social information with random noise and compared to model trained with actual social information. Second, we use a gating mechanism along with a $L_0$ penalty, allowing models to shut down their inner components. The models consistently learn to prune their soft-attention mechanism. For both experiments, neither the course of the convergence nor the prediction performance were altered. This demonstrates that the soft-attention mechanism and therefore the social information are ignored by the models.

Soft Attention: Does it Actually Help to Learn Social Interactions in Pedestrian Trajectory Prediction?

TL;DR

This paper focuses on the deep-learning models with a soft-attention mechanism for social interaction modeling and study whether they use social information at prediction time, and demonstrates that the soft-Attention mechanism and therefore the social information are ignored by the models.

Abstract

We consider the problem of predicting the future path of a pedestrian using its motion history and the motion history of the surrounding pedestrians, called social information. Since the seminal paper on Social-LSTM, deep-learning has become the main tool used to model the impact of social interactions on a pedestrian's motion. The demonstration that these models can learn social interactions relies on an ablative study of these models. The models are compared with and without their social interactions module on two standard metrics, the Average Displacement Error and Final Displacement Error. Yet, these complex models were recently outperformed by a simple constant-velocity approach. This questions if they actually allow to model social interactions as well as the validity of the proof. In this paper, we focus on the deep-learning models with a soft-attention mechanism for social interaction modeling and study whether they use social information at prediction time. We conduct two experiments across four state-of-the-art approaches on the ETH and UCY datasets, which were also used in previous work. First, the models are trained by replacing the social information with random noise and compared to model trained with actual social information. Second, we use a gating mechanism along with a penalty, allowing models to shut down their inner components. The models consistently learn to prune their soft-attention mechanism. For both experiments, neither the course of the convergence nor the prediction performance were altered. This demonstrates that the soft-attention mechanism and therefore the social information are ignored by the models.

Paper Structure

This paper contains 16 sections, 8 equations, 3 figures, 4 tables.

Figures (3)

  • Figure 1: Representation of the generic architecture. Past motion histories are fed into the trajectory module and passed to the social module (dotted line). The trajectory module output a representation of the main pedestrian's past trajectory (in red). The social module outputs a social-context tensor (in black). Both tensors are fed to the prediction module.
  • Figure 2: Illustration of the gating mechanism on a simplified version of the generic architecture. The feature vector $\widetilde{T}_i$ of the main pedestrian $i$ is obtained by multiplying the output of the trajectory module $T_{i}$ with the binary gate $g_\tau$. Similarly, the social context vector $\widetilde{A}_i$ is obtained by multiplying the output of the social module $A_{i}$ with the binary gate $g_a$. They are then fed into the prediction module.
  • Figure 3: Evolution of gate values with the number of epochs when fine-tuning social models with a gating mechanism and $L_0$ penalty term. Results are shown when evaluating on scene zara2.