Studying the Effect of Explicit Interaction Representations on Learning Scene-level Distributions of Human Trajectories
Anna Mészáros, Javier Alonso-Mora, Jens Kober
TL;DR
This work addresses predicting the joint future trajectories of multiple agents in driving scenes by learning a graph-based interaction structure within a normalizing flow framework. The Graph-based Motion Prediction (GMoP) model factorizes the joint distribution via a DAG, enabling variable scene sizes and conditioning on context, while systematically comparing seven interaction representations including several heuristic schemes. Across four real-world datasets, GMoP variants show that explicit, well-chosen interaction structures often improve distribution fitness (NLL) and can beat distribution-free baselines, though results are dataset-dependent and sometimes data-driven independence performs best for NLL. The study highlights the importance of informed interaction modeling, especially in data-sparse or highly interactive scenarios, and outlines future directions toward incorporating temporal dynamics of interactions and agent-type-specific mechanisms.
Abstract
Effectively capturing the joint distribution of all agents in a scene is relevant for predicting the true evolution of the scene and in turn providing more accurate information to the decision processes of autonomous vehicles. While new models have been developed for this purpose in recent years, it remains unclear how to best represent the joint distributions particularly from the perspective of the interactions between agents. Thus far there is no clear consensus on how best to represent interactions between agents; whether they should be learned implicitly from data by neural networks, or explicitly modeled using the spatial and temporal relations that are more grounded in human decision-making. This paper aims to study various means of describing interactions within the same network structure and their effect on the final learned joint distributions. Our findings show that more often than not, simply allowing a network to establish interactive connections between agents based on data has a detrimental effect on performance. Instead, having well defined interactions (such as which agent of an agent pair passes first at an intersection) can often bring about a clear boost in performance.
