Modeling Pedestrian Intrinsic Uncertainty for Multimodal Stochastic Trajectory Prediction via Energy Plan Denoising
Yao Liu, Quan Z. Sheng, Lina Yao
TL;DR
The paper tackles stochastic, multimodal pedestrian trajectory prediction by shifting from predicting single paths to modeling the distribution $p(\mathbf{Y}|\mathbf{X})$. It introduces Energy Plan Denoising (EPD), which combines a coarse Energy Model to generate an initial plan with a Probabilistic Diffusion Model that denoises from this plan to produce distributional samples, reducing iterative steps. EPD includes a Trajectory Distribution (TD) module to capture future-trajectory statistics, a Generative Guidance (GG) module to encode past context, and a Sample Correlation (SC) module with experience replay to align the energy model and distribution. Empirical results on ETH/UCY and SDD show state-of-the-art performance, with ablations confirming the distinct contributions of the LM and PD components and substantial efficiency gains due to plan-guided denoising. This approach offers a practical, scalable framework for multimodal pedestrian forecasting in autonomous driving and smart-city contexts, enabling robust sampling from an explicit trajectory distribution rather than relying on multiple exact trajectories.$
Abstract
Pedestrian trajectory prediction plays a pivotal role in the realms of autonomous driving and smart cities. Despite extensive prior research employing sequence and generative models, the unpredictable nature of pedestrians, influenced by their social interactions and individual preferences, presents challenges marked by uncertainty and multimodality. In response, we propose the Energy Plan Denoising (EPD) model for stochastic trajectory prediction. EPD initially provides a coarse estimation of the distribution of future trajectories, termed the Plan, utilizing the Langevin Energy Model. Subsequently, it refines this estimation through denoising via the Probabilistic Diffusion Model. By initiating denoising with the Plan, EPD effectively reduces the need for iterative steps, thereby enhancing efficiency. Furthermore, EPD differs from conventional approaches by modeling the distribution of trajectories instead of individual trajectories. This allows for the explicit modeling of pedestrian intrinsic uncertainties and eliminates the need for multiple denoising operations. A single denoising operation produces a distribution from which multiple samples can be drawn, significantly enhancing efficiency. Moreover, EPD's fine-tuning of the Plan contributes to improved model performance. We validate EPD on two publicly available datasets, where it achieves state-of-the-art results. Additionally, ablation experiments underscore the contributions of individual modules, affirming the efficacy of the proposed approach.
