Table of Contents
Fetching ...

Optimizing Ego Vehicle Trajectory Prediction: The Graph Enhancement Approach

Sushil Sharma, Aryan Singh, Ganesh Sistu, Mark Halton, Ciarán Eising

TL;DR

The work tackles ego-vehicle trajectory prediction in autonomous driving by leveraging BEV representations to robustly encode spatial relations among scene objects. It proposes a BEV-based pipeline that uses SAM to obtain object candidates, extracts features with DNNs, builds a KNN graph with edge weights $w_{ij}=1/d_{ij}$ augmented by positional encoding, and processes this graph with a Graph Neural Network before modeling temporal dynamics via LSTM to forecast the ego trajectory over a horizon (e.g., 5 steps). The approach emphasizes separating spatial and temporal features to overcome CNN-based limitations and demonstrates competitive performance against traditional DNN-LSTM methods while using a synthetic Carla dataset with Level 1 and Level 2 scenes. Implemented in PyTorch with PyTorch Geometric and Optuna tuning, the method provides a scalable, graph-based BEV framework for efficient ego-trajectory prediction with detailed experimental setup and analysis.

Abstract

Predicting the trajectory of an ego vehicle is a critical component of autonomous driving systems. Current state-of-the-art methods typically rely on Deep Neural Networks (DNNs) and sequential models to process front-view images for future trajectory prediction. However, these approaches often struggle with perspective issues affecting object features in the scene. To address this, we advocate for the use of Bird's Eye View (BEV) perspectives, which offer unique advantages in capturing spatial relationships and object homogeneity. In our work, we leverage Graph Neural Networks (GNNs) and positional encoding to represent objects in a BEV, achieving competitive performance compared to traditional DNN-based methods. While the BEV-based approach loses some detailed information inherent to front-view images, we balance this by enriching the BEV data by representing it as a graph where relationships between the objects in a scene are captured effectively.

Optimizing Ego Vehicle Trajectory Prediction: The Graph Enhancement Approach

TL;DR

The work tackles ego-vehicle trajectory prediction in autonomous driving by leveraging BEV representations to robustly encode spatial relations among scene objects. It proposes a BEV-based pipeline that uses SAM to obtain object candidates, extracts features with DNNs, builds a KNN graph with edge weights augmented by positional encoding, and processes this graph with a Graph Neural Network before modeling temporal dynamics via LSTM to forecast the ego trajectory over a horizon (e.g., 5 steps). The approach emphasizes separating spatial and temporal features to overcome CNN-based limitations and demonstrates competitive performance against traditional DNN-LSTM methods while using a synthetic Carla dataset with Level 1 and Level 2 scenes. Implemented in PyTorch with PyTorch Geometric and Optuna tuning, the method provides a scalable, graph-based BEV framework for efficient ego-trajectory prediction with detailed experimental setup and analysis.

Abstract

Predicting the trajectory of an ego vehicle is a critical component of autonomous driving systems. Current state-of-the-art methods typically rely on Deep Neural Networks (DNNs) and sequential models to process front-view images for future trajectory prediction. However, these approaches often struggle with perspective issues affecting object features in the scene. To address this, we advocate for the use of Bird's Eye View (BEV) perspectives, which offer unique advantages in capturing spatial relationships and object homogeneity. In our work, we leverage Graph Neural Networks (GNNs) and positional encoding to represent objects in a BEV, achieving competitive performance compared to traditional DNN-based methods. While the BEV-based approach loses some detailed information inherent to front-view images, we balance this by enriching the BEV data by representing it as a graph where relationships between the objects in a scene are captured effectively.
Paper Structure (1 section, 2 figures)

This paper contains 1 section, 2 figures.

Table of Contents

  1. INTRODUCTION

Figures (2)

  • Figure 1: Our Overview: Segment anything model sam extracts bounding box info. GNN processes the graph for spatial feature relations, predicting ego vehicle trajectory with LSTM layers.
  • Figure 2: Our proposed architecture: Semantic segmentation derives bounding box coordinates and mask details from a BEV, this information is then utilized by a DNN to inform a KNN, which establishes connections between the boxes to create a graph. A GNN, enhanced with positional encoding, captures spatial features, while LSTM layers integrate temporal dynamics for the prediction of the ego vehicle's trajectory.