Hierarchical Light Transformer Ensembles for Multimodal Trajectory Forecasting
Adrien Lafage, Mathieu Barbier, Gianni Franchi, David Filliat
TL;DR
The paper tackles multimodal trajectory forecasting for safety-critical systems by introducing Hierarchical Light Transformer Ensembles (HLT-Ens), which combine a hierarchical density representation with an efficient, grouped-transformer ensembling framework. A novel Hierarchical Winner-Takes-All (HWTA) loss trains a two-level mixture model consisting of meta-modes and sub-modes, paired with Grouped Fully-Connected (GFC) and Grouped Multi-head Attention (GMHA) to realize lightweight, diverse subnetworks. The approach yields meta-mode–level predictions that are robust and size-efficient, enabling fast compression of the prediction set while preserving coverage of the multimodal distribution. Experiments on Argoverse 1 and Interaction demonstrate state-of-the-art results with substantially lower computational cost than traditional deep ensembles, highlighting practical potential for real-time, uncertainty-aware trajectory forecasting. The work advances the design of scalable, interpretable multimodal forecasts by integrating hierarchical density modeling with transformer-based ensembling.
Abstract
Accurate trajectory forecasting is crucial for the performance of various systems, such as advanced driver-assistance systems and self-driving vehicles. These forecasts allow us to anticipate events that lead to collisions and, therefore, to mitigate them. Deep Neural Networks have excelled in motion forecasting, but overconfidence and weak uncertainty quantification persist. Deep Ensembles address these concerns, yet applying them to multimodal distributions remains challenging. In this paper, we propose a novel approach named Hierarchical Light Transformer Ensembles (HLT-Ens) aimed at efficiently training an ensemble of Transformer architectures using a novel hierarchical loss function. HLT-Ens leverages grouped fully connected layers, inspired by grouped convolution techniques, to capture multimodal distributions effectively. We demonstrate that HLT-Ens achieves state-of-the-art performance levels through extensive experimentation, offering a promising avenue for improving trajectory forecasting techniques.
