Table of Contents
Fetching ...

LiSD: An Efficient Multi-Task Learning Framework for LiDAR Segmentation and Detection

Jiahua Xu, Si Zuo, Chenfeng Wei, Wei Zhou

TL;DR

This work addresses the unified perception problem of lidar-based semantic segmentation and 3D object detection by introducing LiSD, a memory-efficient voxel-based framework. It combines three novel components—HIAM to assimilate global context without densifying sparse data, HFCM to fuse multi-scale features for robust voxel representations, and IARM to refine foreground-point features using instance proposals—within a single forward pass and an uncertainty-weighted multi-task loss. Empirical results on nuScenes and Waymo Open Dataset show LiSD achieving state-of-the-art lidar segmentation on nuScenes (83.3% mIoU) and competitive detection performance, with ablations confirming the contribution of each module. The approach demonstrates the viability and practicality of efficient cross-task learning for autonomous driving perception, offering improved accuracy while preserving sparsity and reducing computation compared to more heavy cross-task transformers.

Abstract

With the rapid proliferation of autonomous driving, there has been a heightened focus on the research of lidar-based 3D semantic segmentation and object detection methodologies, aiming to ensure the safety of traffic participants. In recent decades, learning-based approaches have emerged, demonstrating remarkable performance gains in comparison to conventional algorithms. However, the segmentation and detection tasks have traditionally been examined in isolation to achieve the best precision. To this end, we propose an efficient multi-task learning framework named LiSD which can address both segmentation and detection tasks, aiming to optimize the overall performance. Our proposed LiSD is a voxel-based encoder-decoder framework that contains a hierarchical feature collaboration module and a holistic information aggregation module. Different integration methods are adopted to keep sparsity in segmentation while densifying features for query initialization in detection. Besides, cross-task information is utilized in an instance-aware refinement module to obtain more accurate predictions. Experimental results on the nuScenes dataset and Waymo Open Dataset demonstrate the effectiveness of our proposed model. It is worth noting that LiSD achieves the state-of-the-art performance of 83.3% mIoU on the nuScenes segmentation benchmark for lidar-only methods.

LiSD: An Efficient Multi-Task Learning Framework for LiDAR Segmentation and Detection

TL;DR

This work addresses the unified perception problem of lidar-based semantic segmentation and 3D object detection by introducing LiSD, a memory-efficient voxel-based framework. It combines three novel components—HIAM to assimilate global context without densifying sparse data, HFCM to fuse multi-scale features for robust voxel representations, and IARM to refine foreground-point features using instance proposals—within a single forward pass and an uncertainty-weighted multi-task loss. Empirical results on nuScenes and Waymo Open Dataset show LiSD achieving state-of-the-art lidar segmentation on nuScenes (83.3% mIoU) and competitive detection performance, with ablations confirming the contribution of each module. The approach demonstrates the viability and practicality of efficient cross-task learning for autonomous driving perception, offering improved accuracy while preserving sparsity and reducing computation compared to more heavy cross-task transformers.

Abstract

With the rapid proliferation of autonomous driving, there has been a heightened focus on the research of lidar-based 3D semantic segmentation and object detection methodologies, aiming to ensure the safety of traffic participants. In recent decades, learning-based approaches have emerged, demonstrating remarkable performance gains in comparison to conventional algorithms. However, the segmentation and detection tasks have traditionally been examined in isolation to achieve the best precision. To this end, we propose an efficient multi-task learning framework named LiSD which can address both segmentation and detection tasks, aiming to optimize the overall performance. Our proposed LiSD is a voxel-based encoder-decoder framework that contains a hierarchical feature collaboration module and a holistic information aggregation module. Different integration methods are adopted to keep sparsity in segmentation while densifying features for query initialization in detection. Besides, cross-task information is utilized in an instance-aware refinement module to obtain more accurate predictions. Experimental results on the nuScenes dataset and Waymo Open Dataset demonstrate the effectiveness of our proposed model. It is worth noting that LiSD achieves the state-of-the-art performance of 83.3% mIoU on the nuScenes segmentation benchmark for lidar-only methods.
Paper Structure (13 sections, 7 equations, 4 figures, 4 tables)

This paper contains 13 sections, 7 equations, 4 figures, 4 tables.

Figures (4)

  • Figure 1: The proposed LiSD model adopts point cloud data as its input and simultaneously produces semantic segmentation and object detection results.
  • Figure 2: The architecture of our proposed LiSD is outlined, where the point cloud serves as input for both segmentation and detection tasks. HIAM and HFCM are adopted in the voxel-based encoder-decoder to integrate holistic and hierarchical information for both tasks. Additionally, IARM is introduced to directly refine the point-level feature representation and indirectly exert influence on the box regression.
  • Figure 3: The detailed structure of the proposed HIAM. Additional down-samplings are adopted to acquire holistic information. Subsequently, feature interpolation is employed to aggregate the information for the voxel-based decoder preceding the segmentation head, while coordinates transformation is utilized to integrate the information for the detection head.
  • Figure 4: Semantic segmentation results on the nuScenes dataset. (a) Predicted results of LiSD without IARM, (b) Predicted results of LiSD with IARM, (c) Ground-truth segmentation labels.