Context-aware Multi-task Learning for Pedestrian Intent and Trajectory Prediction
Farzeen Munir, Tomasz Piotr Kucner
TL;DR
PTINet tackles the intertwined problem of pedestrian trajectory and crossing intention by fusing past motion with both local pedestrian attributes and global scene context in a unified multi-task framework. The architecture combines a Position-Velocity Encoding Module (LSTM-VAE), a Global Feature Module (image and optical flow via CLSTM and ResNet-50), and a Local Contextual Feature module, feeding two decoders that jointly predict future bounding boxes and crossing probabilities. Evaluations on JAAD and PIE show state-of-the-art ADE/FDE scores across multiple horizons and high F1-score and accuracy for intention, validating the advantage of jointly modeling trajectory and intention with rich contextual cues. The approach demonstrates practical potential for safer autonomous driving by enabling more accurate anticipation of pedestrian behavior in urban environments.
Abstract
The advancement of socially-aware autonomous vehicles hinges on precise modeling of human behavior. Within this broad paradigm, the specific challenge lies in accurately predicting pedestrian's trajectory and intention. Traditional methodologies have leaned heavily on historical trajectory data, frequently overlooking vital contextual cues such as pedestrian-specific traits and environmental factors. Furthermore, there's a notable knowledge gap as trajectory and intention prediction have largely been approached as separate problems, despite their mutual dependence. To bridge this gap, we introduce PTINet (Pedestrian Trajectory and Intention Prediction Network), which jointly learns the trajectory and intention prediction by combining past trajectory observations, local contextual features (individual pedestrian behaviors), and global features (signs, markings etc.). The efficacy of our approach is evaluated on widely used public datasets: JAAD and PIE, where it has demonstrated superior performance over existing state-of-the-art models in trajectory and intention prediction. The results from our experiments and ablation studies robustly validate PTINet's effectiveness in jointly exploring intention and trajectory prediction for pedestrian behaviour modelling. The experimental evaluation indicates the advantage of using global and local contextual features for pedestrian trajectory and intention prediction. The effectiveness of PTINet in predicting pedestrian behavior paves the way for the development of automated systems capable of seamlessly interacting with pedestrians in urban settings.
