Enhancing Steering Estimation with Semantic-Aware GNNs
Fouad Makiyeh, Huy-Dung Nguyen, Patrick Chareyre, Ramin Hasani, Marc Blanchon, Daniela Rus
TL;DR
This work demonstrates that incorporating 3D spatial information substantially enhances steering estimation in autonomous driving. By systematically evaluating hybrid architectures that combine 3D representations (PointNet++ or Graph Neural Networks) with temporal models (LSTM or Neural Circuit Policies), and by extending to monocular 3D reconstructions via a unified encoder, the authors show that GNN-based hybrids achieve the best performance. They further improve efficiency by semantic-aware graph pruning and validate that monocularly derived pseudo-3D point clouds can match or exceed LiDAR-based performance, achieving a 71% improvement over 2D baselines on KITTI. The approach also enables trajectory/path estimation, demonstrating practical benefits for cost-effective, robust 3D perception in real-world driving scenarios.
Abstract
Steering estimation is a critical task in autonomous driving, traditionally relying on 2D image-based models. In this work, we explore the advantages of incorporating 3D spatial information through hybrid architectures that combine 3D neural network models with recurrent neural networks (RNNs) for temporal modeling, using LiDAR-based point clouds as input. We systematically evaluate four hybrid 3D models, all of which outperform the 2D-only baseline, with the Graph Neural Network (GNN) - RNN model yielding the best results. To reduce reliance on LiDAR, we leverage a pretrained unified model to estimate depth from monocular images, reconstructing pseudo-3D point clouds. We then adapt the GNN-RNN model, originally designed for LiDAR-based point clouds, to work with these pseudo-3D representations, achieving comparable or even superior performance compared to the LiDAR-based model. Additionally, the unified model provides semantic labels for each point, enabling a more structured scene representation. To further optimize graph construction, we introduce an efficient connectivity strategy where connections are predominantly formed between points of the same semantic class, with only 20\% of inter-class connections retained. This targeted approach reduces graph complexity and computational cost while preserving critical spatial relationships. Finally, we validate our approach on the KITTI dataset, achieving a 71% improvement over 2D-only models. Our findings highlight the advantages of 3D spatial information and efficient graph construction for steering estimation, while maintaining the cost-effectiveness of monocular images and avoiding the expense of LiDAR-based systems.
