Optimizing Ego Vehicle Trajectory Prediction: The Graph Enhancement Approach
Sushil Sharma, Aryan Singh, Ganesh Sistu, Mark Halton, Ciarán Eising
TL;DR
The work tackles ego-vehicle trajectory prediction in autonomous driving by leveraging BEV representations to robustly encode spatial relations among scene objects. It proposes a BEV-based pipeline that uses SAM to obtain object candidates, extracts features with DNNs, builds a KNN graph with edge weights $w_{ij}=1/d_{ij}$ augmented by positional encoding, and processes this graph with a Graph Neural Network before modeling temporal dynamics via LSTM to forecast the ego trajectory over a horizon (e.g., 5 steps). The approach emphasizes separating spatial and temporal features to overcome CNN-based limitations and demonstrates competitive performance against traditional DNN-LSTM methods while using a synthetic Carla dataset with Level 1 and Level 2 scenes. Implemented in PyTorch with PyTorch Geometric and Optuna tuning, the method provides a scalable, graph-based BEV framework for efficient ego-trajectory prediction with detailed experimental setup and analysis.
Abstract
Predicting the trajectory of an ego vehicle is a critical component of autonomous driving systems. Current state-of-the-art methods typically rely on Deep Neural Networks (DNNs) and sequential models to process front-view images for future trajectory prediction. However, these approaches often struggle with perspective issues affecting object features in the scene. To address this, we advocate for the use of Bird's Eye View (BEV) perspectives, which offer unique advantages in capturing spatial relationships and object homogeneity. In our work, we leverage Graph Neural Networks (GNNs) and positional encoding to represent objects in a BEV, achieving competitive performance compared to traditional DNN-based methods. While the BEV-based approach loses some detailed information inherent to front-view images, we balance this by enriching the BEV data by representing it as a graph where relationships between the objects in a scene are captured effectively.
