HOME: Heatmap Output for future Motion Estimation
Thomas Gilles, Stefano Sabatini, Dzmitry Tsishkou, Bogdan Stanciulescu, Fabien Moutarde
TL;DR
HOME reframes motion forecasting as predicting a 2D heatmap over the future final position, providing a complete probabilistic representation of multimodal futures. The method combines CNN-based rasterized-context encoding, inter-agent attention, and a heatmap decoder, followed by two sampling algorithms that optimize either Miss Rate or Final Displacement Error without retraining. A separate trajectory generator converts end-points into full trajectories, yielding coherent motion predictions conditioned on sampled endpoints. On Argoverse, HOME achieves state-of-the-art Miss Rate 6 and competitive displacement metrics, with ablations confirming the heatmap representation and sampling strategies provide robust coverage and controllable trade-offs.
Abstract
In this paper, we propose HOME, a framework tackling the motion forecasting problem with an image output representing the probability distribution of the agent's future location. This method allows for a simple architecture with classic convolution networks coupled with attention mechanism for agent interactions, and outputs an unconstrained 2D top-view representation of the agent's possible future. Based on this output, we design two methods to sample a finite set of agent's future locations. These methods allow us to control the optimization trade-off between miss rate and final displacement error for multiple modalities without having to retrain any part of the model. We apply our method to the Argoverse Motion Forecasting Benchmark and achieve 1st place on the online leaderboard.
