Table of Contents
Fetching ...

LiDAR-BEVMTN: Real-Time LiDAR Bird's-Eye View Multi-Task Perception Network for Autonomous Driving

Sambit Mohapatra, Senthil Yogamani, Varun Ravi Kumar, Stefan Milz, Heinrich Gotzig, Patrick Mäder

TL;DR

This work provides the first embedded implementation unifying these key perception tasks from LiDAR point clouds achieving 3ms latency on the embedded NVIDIA Xavier platform and proposes a novel Semantic Weighting and Guidance (SWAG) module to transfer semantic features for improved object detection selectively.

Abstract

LiDAR is crucial for robust 3D scene perception in autonomous driving. LiDAR perception has the largest body of literature after camera perception. However, multi-task learning across tasks like detection, segmentation, and motion estimation using LiDAR remains relatively unexplored, especially on automotive-grade embedded platforms. We present a real-time multi-task convolutional neural network for LiDAR-based object detection, semantics, and motion segmentation. The unified architecture comprises a shared encoder and task-specific decoders, enabling joint representation learning. We propose a novel Semantic Weighting and Guidance (SWAG) module to transfer semantic features for improved object detection selectively. Our heterogeneous training scheme combines diverse datasets and exploits complementary cues between tasks. The work provides the first embedded implementation unifying these key perception tasks from LiDAR point clouds achieving 3ms latency on the embedded NVIDIA Xavier platform. We achieve state-of-the-art results for two tasks, semantic and motion segmentation, and close to state-of-the-art performance for 3D object detection. By maximizing hardware efficiency and leveraging multi-task synergies, our method delivers an accurate and efficient solution tailored for real-world automated driving deployment. Qualitative results can be seen at https://youtu.be/H-hWRzv2lIY.

LiDAR-BEVMTN: Real-Time LiDAR Bird's-Eye View Multi-Task Perception Network for Autonomous Driving

TL;DR

This work provides the first embedded implementation unifying these key perception tasks from LiDAR point clouds achieving 3ms latency on the embedded NVIDIA Xavier platform and proposes a novel Semantic Weighting and Guidance (SWAG) module to transfer semantic features for improved object detection selectively.

Abstract

LiDAR is crucial for robust 3D scene perception in autonomous driving. LiDAR perception has the largest body of literature after camera perception. However, multi-task learning across tasks like detection, segmentation, and motion estimation using LiDAR remains relatively unexplored, especially on automotive-grade embedded platforms. We present a real-time multi-task convolutional neural network for LiDAR-based object detection, semantics, and motion segmentation. The unified architecture comprises a shared encoder and task-specific decoders, enabling joint representation learning. We propose a novel Semantic Weighting and Guidance (SWAG) module to transfer semantic features for improved object detection selectively. Our heterogeneous training scheme combines diverse datasets and exploits complementary cues between tasks. The work provides the first embedded implementation unifying these key perception tasks from LiDAR point clouds achieving 3ms latency on the embedded NVIDIA Xavier platform. We achieve state-of-the-art results for two tasks, semantic and motion segmentation, and close to state-of-the-art performance for 3D object detection. By maximizing hardware efficiency and leveraging multi-task synergies, our method delivers an accurate and efficient solution tailored for real-world automated driving deployment. Qualitative results can be seen at https://youtu.be/H-hWRzv2lIY.
Paper Structure (16 sections, 12 equations, 5 figures, 9 tables)

This paper contains 16 sections, 12 equations, 5 figures, 9 tables.

Figures (5)

  • Figure 1: High-level Architecture of Level 2/Level 3 Autonomous Driving System.
  • Figure 2: Illustration of proposed Range based point cloud densification - Raw point cloud (left) and densified point cloud (right).
  • Figure 3: Effect of point densification on objects in BEV - BEV raw (left) and BEV densified (right).
  • Figure 4: LiDAR-BEVMTN multi-task learning architecture. Top: High-level view showing shared encoder and task-specific decoders. Bottom: Encoder and decoder sub-modules details. Key features: 1) Shared encoder for unified feature extraction 2) SWAG module for cross-task interactions 3) Asynchronous training across datasets 4) Task-specific decoders enable multi-task pixel-level prediction.
  • Figure 5: Qualitative results for performance comparison - MTL and STL approaches for object detection, semantic segmentation, and motion segmentation. The MTL approach improves all tasks for the shown failure cases of STL. The results can be further viewed in high-quality in the video https://youtu.be/H-hWRzv2lIY.