Table of Contents
Fetching ...

SegNet4D: Efficient Instance-Aware 4D Semantic Segmentation for LiDAR Point Cloud

Neng Wang, Ruibin Guo, Chenghao Shi, Ziyue Wang, Hui Zhang, Huimin Lu, Zhiqiang Zheng, Xieyuanli Chen

TL;DR

SegNet4D addresses real-time 4D LiDAR semantic segmentation by decoupling moving-object segmentation (MOS) from single-scan semantic segmentation (SSS) and fusing them through a motion-semantic fusion module. It encodes motion cues efficiently from BEV residuals of sequential scans, avoiding costly 4D convolutions, and incorporates instance information at both feature and point levels via an instance-aware backbone. The two heads (motion and semantic) are complemented by a motion-semantic fusion mechanism that yields coherent 4D predictions, with an instance-refinement step enhancing MOS accuracy. Demonstrated on SemanticKITTI and nuScenes, SegNet4D achieves state-of-the-art performance for both 4D segmentation and MOS while running in real time on embedded hardware and validating its practicality on a real unmanned platform.

Abstract

4D LiDAR semantic segmentation, also referred to as multi-scan semantic segmentation, plays a crucial role in enhancing the environmental understanding capabilities of autonomous vehicles or robots. It classifies the semantic category of each LiDAR measurement point and detects whether it is dynamic, a critical ability for tasks like obstacle avoidance and autonomous navigation. Existing approaches often rely on computationally heavy 4D convolutions or recursive networks, which result in poor real-time performance, making them unsuitable for online robotics and autonomous driving applications. In this paper, we introduce SegNet4D, a novel real-time 4D semantic segmentation network offering both efficiency and strong semantic understanding. SegNet4D addresses 4D segmentation as two tasks: single-scan semantic segmentation and moving object segmentation, each tackled by a separate network head. Both results are combined in a motion-semantic fusion module to achieve comprehensive 4D segmentation. Additionally, instance information is extracted from the current scan and exploited for instance-wise segmentation consistency. Our approach surpasses state-of-the-art in both multi-scan semantic segmentation and moving object segmentation while offering greater efficiency, enabling real-time operation. Besides, its effectiveness and efficiency have also been validated on a real-world unmanned ground platform. Our code will be released at https://github.com/nubot-nudt/SegNet4D.

SegNet4D: Efficient Instance-Aware 4D Semantic Segmentation for LiDAR Point Cloud

TL;DR

SegNet4D addresses real-time 4D LiDAR semantic segmentation by decoupling moving-object segmentation (MOS) from single-scan semantic segmentation (SSS) and fusing them through a motion-semantic fusion module. It encodes motion cues efficiently from BEV residuals of sequential scans, avoiding costly 4D convolutions, and incorporates instance information at both feature and point levels via an instance-aware backbone. The two heads (motion and semantic) are complemented by a motion-semantic fusion mechanism that yields coherent 4D predictions, with an instance-refinement step enhancing MOS accuracy. Demonstrated on SemanticKITTI and nuScenes, SegNet4D achieves state-of-the-art performance for both 4D segmentation and MOS while running in real time on embedded hardware and validating its practicality on a real unmanned platform.

Abstract

4D LiDAR semantic segmentation, also referred to as multi-scan semantic segmentation, plays a crucial role in enhancing the environmental understanding capabilities of autonomous vehicles or robots. It classifies the semantic category of each LiDAR measurement point and detects whether it is dynamic, a critical ability for tasks like obstacle avoidance and autonomous navigation. Existing approaches often rely on computationally heavy 4D convolutions or recursive networks, which result in poor real-time performance, making them unsuitable for online robotics and autonomous driving applications. In this paper, we introduce SegNet4D, a novel real-time 4D semantic segmentation network offering both efficiency and strong semantic understanding. SegNet4D addresses 4D segmentation as two tasks: single-scan semantic segmentation and moving object segmentation, each tackled by a separate network head. Both results are combined in a motion-semantic fusion module to achieve comprehensive 4D segmentation. Additionally, instance information is extracted from the current scan and exploited for instance-wise segmentation consistency. Our approach surpasses state-of-the-art in both multi-scan semantic segmentation and moving object segmentation while offering greater efficiency, enabling real-time operation. Besides, its effectiveness and efficiency have also been validated on a real-world unmanned ground platform. Our code will be released at https://github.com/nubot-nudt/SegNet4D.
Paper Structure (19 sections, 9 equations, 8 figures, 5 tables)

This paper contains 19 sections, 9 equations, 8 figures, 5 tables.

Figures (8)

  • Figure 1: The proposed framework of SegNet4D. We first utilize the Motion Features Encoding Module to extract motion features from the sequential LiDAR scans. Following this, the motion features are concatenated with the spatial features of the current scan and fed into the Instance-Aware Feature Extraction Backbone. Then, two separate heads are applied: a motion head for predicting moving states, and a semantic head for predicting semantic category. Finally, the Motion-Semantic Fusion Module integrates the motion and semantic features to achieve 4D semantic segmentation.
  • Figure 2: The process for motion features encoding. We mainly calculate the residuals for the sequential BEV images and back-project them into the 3D space as the motion features.
  • Figure 3: Motion features visualization. (a) and (b) represent the motion features obtained from the current and past $N$-th scan. We compare the features with the network's predictions as well as ground truth.
  • Figure 4: The architecture of the Instance-Aware Feature Extraction Backbone. The Instance Detection Module is utilized to extract features from the input point cloud and predict the instance bounding box. We then integrate such instance information into the Upsample Fusion Module to achieve instance-aware segmentation.
  • Figure 5: The architecture of Motion-Semantic Fusion Module. We mainly perform spatial attention and channel attention to fuse the motion features and the static semantic features.
  • ...and 3 more figures