SocialCVAE: Predicting Pedestrian Trajectory via Interaction Conditioned Latents
Wei Xiang, Haoteng Yin, He Wang, Xiaogang Jin
TL;DR
This work tackles pedestrian trajectory prediction by balancing accuracy with interpretability. It introduces SocialCVAE, a hybrid model that blends an energy-based interaction mechanism with an interaction-conditioned CVAE to capture multimodal and socially aware motion, conditioning the CVAE on a socially explainable energy map. The coarse motion predictor, energy map, and CVAE work in a recursive framework to produce diverse future trajectories while accounting for neighbors and static/dynamic obstacles. Empirical results on ETH-UCY and SDD show state-of-the-art ADE/FDE improvements, validating the value of explicitly modeling socially influenced randomness for pedestrian motion forecasting.
Abstract
Pedestrian trajectory prediction is the key technology in many applications for providing insights into human behavior and anticipating human future motions. Most existing empirical models are explicitly formulated by observed human behaviors using explicable mathematical terms with a deterministic nature, while recent work has focused on developing hybrid models combined with learning-based techniques for powerful expressiveness while maintaining explainability. However, the deterministic nature of the learned steering behaviors from the empirical models limits the models' practical performance. To address this issue, this work proposes the social conditional variational autoencoder (SocialCVAE) for predicting pedestrian trajectories, which employs a CVAE to explore behavioral uncertainty in human motion decisions. SocialCVAE learns socially reasonable motion randomness by utilizing a socially explainable interaction energy map as the CVAE's condition, which illustrates the future occupancy of each pedestrian's local neighborhood area. The energy map is generated using an energy-based interaction model, which anticipates the energy cost (i.e., repulsion intensity) of pedestrians' interactions with neighbors. Experimental results on two public benchmarks including 25 scenes demonstrate that SocialCVAE significantly improves prediction accuracy compared with the state-of-the-art methods, with up to 16.85% improvement in Average Displacement Error (ADE) and 69.18% improvement in Final Displacement Error (FDE).
