Table of Contents
Fetching ...

Uplifting Range-View-based 3D Semantic Segmentation in Real-Time with Multi-Sensor Fusion

Shiqi Tan, Hamidreza Fazlali, Yixuan Xu, Yuan Ren, Bingbing Liu

TL;DR

This work tackles the limitations of Range-View based 3D semantic segmentation for autonomous driving by introducing LaCRange, a multi-sensor fusion framework that distorts-free guidance from RGB images to a lightweight RV processor. It couples distortion-compensating knowledge distillation (DCKD) with a context-based feature fusion (CFF) module and a portable point refinement pipeline (SR^2FA and 3D-NAFA) to mitigate projection distortions and preserve 3D topology. Across SemanticKITTI and nuScenes, LaCRange delivers real-time performance and competitive or superior accuracy, with ablations showing substantial gains from 3D neighborhood augmentation, effective fusion strategies, and robust distillation. The proposed methods are modular and adaptable, offering plug-and-play improvements for existing RV-based segmentation pipelines and enabling more reliable perception in diverse driving conditions.

Abstract

Range-View(RV)-based 3D point cloud segmentation is widely adopted due to its compact data form. However, RV-based methods fall short in providing robust segmentation for the occluded points and suffer from distortion of projected RGB images due to the sparse nature of 3D point clouds. To alleviate these problems, we propose a new LiDAR and Camera Range-view-based 3D point cloud semantic segmentation method (LaCRange). Specifically, a distortion-compensating knowledge distillation (DCKD) strategy is designed to remedy the adverse effect of RV projection of RGB images. Moreover, a context-based feature fusion module is introduced for robust and preservative sensor fusion. Finally, in order to address the limited resolution of RV and its insufficiency of 3D topology, a new point refinement scheme is devised for proper aggregation of features in 2D and augmentation of point features in 3D. We evaluated the proposed method on large-scale autonomous driving datasets \ie SemanticKITTI and nuScenes. In addition to being real-time, the proposed method achieves state-of-the-art results on nuScenes benchmark

Uplifting Range-View-based 3D Semantic Segmentation in Real-Time with Multi-Sensor Fusion

TL;DR

This work tackles the limitations of Range-View based 3D semantic segmentation for autonomous driving by introducing LaCRange, a multi-sensor fusion framework that distorts-free guidance from RGB images to a lightweight RV processor. It couples distortion-compensating knowledge distillation (DCKD) with a context-based feature fusion (CFF) module and a portable point refinement pipeline (SR^2FA and 3D-NAFA) to mitigate projection distortions and preserve 3D topology. Across SemanticKITTI and nuScenes, LaCRange delivers real-time performance and competitive or superior accuracy, with ablations showing substantial gains from 3D neighborhood augmentation, effective fusion strategies, and robust distillation. The proposed methods are modular and adaptable, offering plug-and-play improvements for existing RV-based segmentation pipelines and enabling more reliable perception in diverse driving conditions.

Abstract

Range-View(RV)-based 3D point cloud segmentation is widely adopted due to its compact data form. However, RV-based methods fall short in providing robust segmentation for the occluded points and suffer from distortion of projected RGB images due to the sparse nature of 3D point clouds. To alleviate these problems, we propose a new LiDAR and Camera Range-view-based 3D point cloud semantic segmentation method (LaCRange). Specifically, a distortion-compensating knowledge distillation (DCKD) strategy is designed to remedy the adverse effect of RV projection of RGB images. Moreover, a context-based feature fusion module is introduced for robust and preservative sensor fusion. Finally, in order to address the limited resolution of RV and its insufficiency of 3D topology, a new point refinement scheme is devised for proper aggregation of features in 2D and augmentation of point features in 3D. We evaluated the proposed method on large-scale autonomous driving datasets \ie SemanticKITTI and nuScenes. In addition to being real-time, the proposed method achieves state-of-the-art results on nuScenes benchmark
Paper Structure (25 sections, 5 equations, 5 figures, 7 tables)

This paper contains 25 sections, 5 equations, 5 figures, 7 tables.

Figures (5)

  • Figure 1: Overview of the proposed LaCRange framework. The blocks and arrows with dashes are only used during the training process. The blocks shown in green are the proposed components. Best viewed in color
  • Figure 2: Context-based feature fusion module. $C_{pre}/L_{pre}$ and $C_{comp}/L_{comp}$ represent the original camera/LiDAR and retrieved camera/LiDAR features, respectively. Additionally, $C_{s}/C_{c}$, $L_{s}/L_{c}$ and $F_{s}/F_{c}$ are the space-/channel-wise pooled camera, LiDAR and initially fused features.
  • Figure 3: Semantic-Range-Remission-based Feature Aggregation (SR$^{2}$FA) module. $\blacksquare$ and $\bigblacktriangleup$ represent the locations selected based on semantic and range-remission, respectively.
  • Figure 4: 3D Neighborhood-Aware Feature Augmentation (3D-NAFA) module. Best viewed in color.
  • Figure 5: Results of segmentation with (right) and without proposed point refinement (left). Best viewed in color.