Scaling Motion Forecasting Models with Ensemble Distillation
Scott Ettinger, Kratarth Goel, Avikalp Srivastava, Rami Al-Rfou
TL;DR
The paper addresses the challenge of achieving high-accuracy motion forecasting for autonomous systems under limited onboard compute. It introduces a general ensemble distillation framework that first constructs a large, diverse ensemble of motion forecasting models and then distills their multi-modal outputs into a compact student model, preserving accuracy while reducing compute. Empirical results on the Waymo Open Motion Dataset and Argoverse 2 show that ensembles scale performance with compute, achieving podium-level standings, while distilled students retain much of that accuracy at a fraction of the FLOPs. The approach enables real-time, high-quality trajectory prediction for robotics and autonomous driving within strict hardware budgets, offering practical benefits for safety-critical planning and robustness in dynamic scenes.
Abstract
Motion forecasting has become an increasingly critical component of autonomous robotic systems. Onboard compute budgets typically limit the accuracy of real-time systems. In this work we propose methods of improving motion forecasting systems subject to limited compute budgets by combining model ensemble and distillation techniques. The use of ensembles of deep neural networks has been shown to improve generalization accuracy in many application domains. We first demonstrate significant performance gains by creating a large ensemble of optimized single models. We then develop a generalized framework to distill motion forecasting model ensembles into small student models which retain high performance with a fraction of the computing cost. For this study we focus on the task of motion forecasting using real world data from autonomous driving systems. We develop ensemble models that are very competitive on the Waymo Open Motion Dataset (WOMD) and Argoverse leaderboards. From these ensembles, we train distilled student models which have high performance at a fraction of the compute costs. These experiments demonstrate distillation from ensembles as an effective method for improving accuracy of predictive models for robotic systems with limited compute budgets.
