Table of Contents
Fetching ...

Are We Ready for Real-Time LiDAR Semantic Segmentation in Autonomous Driving?

Samir Abou Haidar, Alexandre Chariot, Mehdi Darouich, Cyril Joly, Jean-Emmanuel Deschaud

TL;DR

This work benchmarks multiple 3D LiDAR semantic segmentation networks across embedded NVIDIA Jetson platforms (Orin and Xavier) using standardized training on SemanticKITTI and nuScenes. It compares projection-based, point-based, sparse convolution, and fusion methods, highlighting pre-processing as a major bottleneck that often dominates total runtime. Key findings show SalsaNext as the closest to real-time on Jetson hardware, but with trade-offs in accuracy, while other architectures struggle to meet real-time constraints; pruning can reduce parameters but harms performance, and heavy pre-processing in point-based methods hinders practical deployment. The study underscores the need for substantial hardware-aware redesigns, focusing on pre-processing optimization, sparsity exploitation, and pipeline-overlap strategies to enable real-time embedded LiDAR semantic segmentation in autonomous systems.

Abstract

Within a perception framework for autonomous mobile and robotic systems, semantic analysis of 3D point clouds typically generated by LiDARs is key to numerous applications, such as object detection and recognition, and scene reconstruction. Scene semantic segmentation can be achieved by directly integrating 3D spatial data with specialized deep neural networks. Although this type of data provides rich geometric information regarding the surrounding environment, it also presents numerous challenges: its unstructured and sparse nature, its unpredictable size, and its demanding computational requirements. These characteristics hinder the real-time semantic analysis, particularly on resource-constrained hardware architectures that constitute the main computational components of numerous robotic applications. Therefore, in this paper, we investigate various 3D semantic segmentation methodologies and analyze their performance and capabilities for resource-constrained inference on embedded NVIDIA Jetson platforms. We evaluate them for a fair comparison through a standardized training protocol and data augmentations, providing benchmark results on the Jetson AGX Orin and AGX Xavier series for two large-scale outdoor datasets: SemanticKITTI and nuScenes.

Are We Ready for Real-Time LiDAR Semantic Segmentation in Autonomous Driving?

TL;DR

This work benchmarks multiple 3D LiDAR semantic segmentation networks across embedded NVIDIA Jetson platforms (Orin and Xavier) using standardized training on SemanticKITTI and nuScenes. It compares projection-based, point-based, sparse convolution, and fusion methods, highlighting pre-processing as a major bottleneck that often dominates total runtime. Key findings show SalsaNext as the closest to real-time on Jetson hardware, but with trade-offs in accuracy, while other architectures struggle to meet real-time constraints; pruning can reduce parameters but harms performance, and heavy pre-processing in point-based methods hinders practical deployment. The study underscores the need for substantial hardware-aware redesigns, focusing on pre-processing optimization, sparsity exploitation, and pipeline-overlap strategies to enable real-time embedded LiDAR semantic segmentation in autonomous systems.

Abstract

Within a perception framework for autonomous mobile and robotic systems, semantic analysis of 3D point clouds typically generated by LiDARs is key to numerous applications, such as object detection and recognition, and scene reconstruction. Scene semantic segmentation can be achieved by directly integrating 3D spatial data with specialized deep neural networks. Although this type of data provides rich geometric information regarding the surrounding environment, it also presents numerous challenges: its unstructured and sparse nature, its unpredictable size, and its demanding computational requirements. These characteristics hinder the real-time semantic analysis, particularly on resource-constrained hardware architectures that constitute the main computational components of numerous robotic applications. Therefore, in this paper, we investigate various 3D semantic segmentation methodologies and analyze their performance and capabilities for resource-constrained inference on embedded NVIDIA Jetson platforms. We evaluate them for a fair comparison through a standardized training protocol and data augmentations, providing benchmark results on the Jetson AGX Orin and AGX Xavier series for two large-scale outdoor datasets: SemanticKITTI and nuScenes.

Paper Structure

This paper contains 15 sections, 2 figures, 4 tables.

Figures (2)

  • Figure 1: With a 20Hz acquisition sensor on nuScenes, only SalsaNext can be executed in real-time on RTX4090 and Jetson AGX Orin. Yet, its mIoU falls behind the other models which are far from real-time execution on Jetson platforms.
  • Figure 2: mIoU vs. runtime SemanticKITTI (10Hz input sensor)