Table of Contents
Fetching ...

HOME: Heatmap Output for future Motion Estimation

Thomas Gilles, Stefano Sabatini, Dzmitry Tsishkou, Bogdan Stanciulescu, Fabien Moutarde

TL;DR

HOME reframes motion forecasting as predicting a 2D heatmap over the future final position, providing a complete probabilistic representation of multimodal futures. The method combines CNN-based rasterized-context encoding, inter-agent attention, and a heatmap decoder, followed by two sampling algorithms that optimize either Miss Rate or Final Displacement Error without retraining. A separate trajectory generator converts end-points into full trajectories, yielding coherent motion predictions conditioned on sampled endpoints. On Argoverse, HOME achieves state-of-the-art Miss Rate 6 and competitive displacement metrics, with ablations confirming the heatmap representation and sampling strategies provide robust coverage and controllable trade-offs.

Abstract

In this paper, we propose HOME, a framework tackling the motion forecasting problem with an image output representing the probability distribution of the agent's future location. This method allows for a simple architecture with classic convolution networks coupled with attention mechanism for agent interactions, and outputs an unconstrained 2D top-view representation of the agent's possible future. Based on this output, we design two methods to sample a finite set of agent's future locations. These methods allow us to control the optimization trade-off between miss rate and final displacement error for multiple modalities without having to retrain any part of the model. We apply our method to the Argoverse Motion Forecasting Benchmark and achieve 1st place on the online leaderboard.

HOME: Heatmap Output for future Motion Estimation

TL;DR

HOME reframes motion forecasting as predicting a 2D heatmap over the future final position, providing a complete probabilistic representation of multimodal futures. The method combines CNN-based rasterized-context encoding, inter-agent attention, and a heatmap decoder, followed by two sampling algorithms that optimize either Miss Rate or Final Displacement Error without retraining. A separate trajectory generator converts end-points into full trajectories, yielding coherent motion predictions conditioned on sampled endpoints. On Argoverse, HOME achieves state-of-the-art Miss Rate 6 and competitive displacement metrics, with ablations confirming the heatmap representation and sampling strategies provide robust coverage and controllable trade-offs.

Abstract

In this paper, we propose HOME, a framework tackling the motion forecasting problem with an image output representing the probability distribution of the agent's future location. This method allows for a simple architecture with classic convolution networks coupled with attention mechanism for agent interactions, and outputs an unconstrained 2D top-view representation of the agent's possible future. Based on this output, we design two methods to sample a finite set of agent's future locations. These methods allow us to control the optimization trade-off between miss rate and final displacement error for multiple modalities without having to retrain any part of the model. We apply our method to the Argoverse Motion Forecasting Benchmark and achieve 1st place on the online leaderboard.

Paper Structure

This paper contains 23 sections, 5 equations, 7 figures, 3 tables, 2 algorithms.

Figures (7)

  • Figure 1: Summary of our approach. The yellow/red heatmap is our predicted probability distribution and the blue points are the sampled final point predictions.
  • Figure 2: HOME pipeline. a) Context map, target agent (blue) and neighbor (green) trajectories are given as input to the network. b) Heatmap output of the network. c) Sampled final points. d) Trajectories are built for each final point
  • Figure 3: Example of input and output data for our model with brief description of architecture
  • Figure 4: Illustration of sampling methods
  • Figure 5: Effect of maximum number $k$ of modalities trained on metrics of lower fixed modality numbers. Full lines are results of regression output model. Dashed lines are result of our heatmap output model. We show the Miss Rate for total number of predicted modalities $k$ (blue) and fixed number of modalities 1 (orange), 3 (green) and 6 (red).
  • ...and 2 more figures