MoFlow: One-Step Flow Matching for Human Trajectory Forecasting via Implicit Maximum Likelihood Estimation based Distillation
Yuxiang Fu, Qi Yan, Lele Wang, Ke Li, Renjie Liao
TL;DR
MoFlow addresses multi-modal human trajectory forecasting by modeling $K$ correlated future paths with a conditional flow matching framework. A data-space reformulation of flow matching, combined with a multi-modal objective, yields diverse, accurate predictions, while an IMLE-based distillation enables one-step sampling without sacrificing quality. The teacher model achieves state-of-the-art results on NBA, ETH-UCY, and SDD, and the IMLE student delivers comparable accuracy at about 100× faster sampling, making deployment practical for time-critical settings. This work advances probabilistic trajectory forecasting by unifying flow-based generation with efficient, principled distillation, offering both accuracy and runtime benefits for real-world scenarios.
Abstract
In this paper, we address the problem of human trajectory forecasting, which aims to predict the inherently multi-modal future movements of humans based on their past trajectories and other contextual cues. We propose a novel motion prediction conditional flow matching model, termed MoFlow, to predict K-shot future trajectories for all agents in a given scene. We design a novel flow matching loss function that not only ensures at least one of the $K$ sets of future trajectories is accurate but also encourages all $K$ sets of future trajectories to be diverse and plausible. Furthermore, by leveraging the implicit maximum likelihood estimation (IMLE), we propose a novel distillation method for flow models that only requires samples from the teacher model. Extensive experiments on the real-world datasets, including SportVU NBA games, ETH-UCY, and SDD, demonstrate that both our teacher flow model and the IMLE-distilled student model achieve state-of-the-art performance. These models can generate diverse trajectories that are physically and socially plausible. Moreover, our one-step student model is $\textbf{100}$ times faster than the teacher flow model during sampling. The code, model, and data are available at our project page: https://moflow-imle.github.io
