Table of Contents
Fetching ...

Enhancing Steering Estimation with Semantic-Aware GNNs

Fouad Makiyeh, Huy-Dung Nguyen, Patrick Chareyre, Ramin Hasani, Marc Blanchon, Daniela Rus

TL;DR

This work demonstrates that incorporating 3D spatial information substantially enhances steering estimation in autonomous driving. By systematically evaluating hybrid architectures that combine 3D representations (PointNet++ or Graph Neural Networks) with temporal models (LSTM or Neural Circuit Policies), and by extending to monocular 3D reconstructions via a unified encoder, the authors show that GNN-based hybrids achieve the best performance. They further improve efficiency by semantic-aware graph pruning and validate that monocularly derived pseudo-3D point clouds can match or exceed LiDAR-based performance, achieving a 71% improvement over 2D baselines on KITTI. The approach also enables trajectory/path estimation, demonstrating practical benefits for cost-effective, robust 3D perception in real-world driving scenarios.

Abstract

Steering estimation is a critical task in autonomous driving, traditionally relying on 2D image-based models. In this work, we explore the advantages of incorporating 3D spatial information through hybrid architectures that combine 3D neural network models with recurrent neural networks (RNNs) for temporal modeling, using LiDAR-based point clouds as input. We systematically evaluate four hybrid 3D models, all of which outperform the 2D-only baseline, with the Graph Neural Network (GNN) - RNN model yielding the best results. To reduce reliance on LiDAR, we leverage a pretrained unified model to estimate depth from monocular images, reconstructing pseudo-3D point clouds. We then adapt the GNN-RNN model, originally designed for LiDAR-based point clouds, to work with these pseudo-3D representations, achieving comparable or even superior performance compared to the LiDAR-based model. Additionally, the unified model provides semantic labels for each point, enabling a more structured scene representation. To further optimize graph construction, we introduce an efficient connectivity strategy where connections are predominantly formed between points of the same semantic class, with only 20\% of inter-class connections retained. This targeted approach reduces graph complexity and computational cost while preserving critical spatial relationships. Finally, we validate our approach on the KITTI dataset, achieving a 71% improvement over 2D-only models. Our findings highlight the advantages of 3D spatial information and efficient graph construction for steering estimation, while maintaining the cost-effectiveness of monocular images and avoiding the expense of LiDAR-based systems.

Enhancing Steering Estimation with Semantic-Aware GNNs

TL;DR

This work demonstrates that incorporating 3D spatial information substantially enhances steering estimation in autonomous driving. By systematically evaluating hybrid architectures that combine 3D representations (PointNet++ or Graph Neural Networks) with temporal models (LSTM or Neural Circuit Policies), and by extending to monocular 3D reconstructions via a unified encoder, the authors show that GNN-based hybrids achieve the best performance. They further improve efficiency by semantic-aware graph pruning and validate that monocularly derived pseudo-3D point clouds can match or exceed LiDAR-based performance, achieving a 71% improvement over 2D baselines on KITTI. The approach also enables trajectory/path estimation, demonstrating practical benefits for cost-effective, robust 3D perception in real-world driving scenarios.

Abstract

Steering estimation is a critical task in autonomous driving, traditionally relying on 2D image-based models. In this work, we explore the advantages of incorporating 3D spatial information through hybrid architectures that combine 3D neural network models with recurrent neural networks (RNNs) for temporal modeling, using LiDAR-based point clouds as input. We systematically evaluate four hybrid 3D models, all of which outperform the 2D-only baseline, with the Graph Neural Network (GNN) - RNN model yielding the best results. To reduce reliance on LiDAR, we leverage a pretrained unified model to estimate depth from monocular images, reconstructing pseudo-3D point clouds. We then adapt the GNN-RNN model, originally designed for LiDAR-based point clouds, to work with these pseudo-3D representations, achieving comparable or even superior performance compared to the LiDAR-based model. Additionally, the unified model provides semantic labels for each point, enabling a more structured scene representation. To further optimize graph construction, we introduce an efficient connectivity strategy where connections are predominantly formed between points of the same semantic class, with only 20\% of inter-class connections retained. This targeted approach reduces graph complexity and computational cost while preserving critical spatial relationships. Finally, we validate our approach on the KITTI dataset, achieving a 71% improvement over 2D-only models. Our findings highlight the advantages of 3D spatial information and efficient graph construction for steering estimation, while maintaining the cost-effectiveness of monocular images and avoiding the expense of LiDAR-based systems.

Paper Structure

This paper contains 15 sections, 10 equations, 5 figures, 2 tables.

Figures (5)

  • Figure 1: Illustration of Our Approach: From Classical 2D Models to Learned 3D Representations - (I) classical 2D Models, (II) 3D Models with 3D point clouds, and (III) pseudo-point Clouds from monocular images with a learned unified encoder.
  • Figure 2: Illustration of original RGB input (top left), estimated dense depth (top right), semantic segmentation prediction (bottom left) and semantic-based point cloud (bottom right).
  • Figure 3: Illustration of graph reduction and semantic graphs. Left is initial graph from down-sampled point cloud, right is semantic-aware graph where color nodes represent classes. Filtering enable reduction of edges from $600$ on the left to $296$ on the right.
  • Figure 4: Visualization of different driving scenarios. Icons and arrows are red for ground truth and green for prediction.
  • Figure 5: Predicted vehicle trajectories using steering estimation from different 3D models.