Table of Contents
Fetching ...

Small, Versatile and Mighty: A Range-View Perception Framework

Qiang Meng, Xiao Wang, JiaBao Wang, Liujiang Yan, Ke Wang

TL;DR

This paper tackles 3D perception from LiDAR by leveraging the range-view representation to achieve both efficiency and multi-task capability. It introduces a lightweight, fully convolutional framework called Small, Versatile, Mighty (SVM) that integrates Perspective Centric Label Assignment (PCLA) and View Adaptive Regression (VAR) to boost 3D detection while enabling semantic and panoptic segmentation without extra modules. The approach achieves state-of-the-art performance among range-view detectors on the Waymo Open Dataset, with notable gains for the vehicle class and strong segmentation results. These findings demonstrate the viability of range-view representations for real-time, multi-task LiDAR perception in autonomous driving.

Abstract

Despite its compactness and information integrity, the range view representation of LiDAR data rarely occurs as the first choice for 3D perception tasks. In this work, we further push the envelop of the range-view representation with a novel multi-task framework, achieving unprecedented 3D detection performances. Our proposed Small, Versatile, and Mighty (SVM) network utilizes a pure convolutional architecture to fully unleash the efficiency and multi-tasking potentials of the range view representation. To boost detection performances, we first propose a range-view specific Perspective Centric Label Assignment (PCLA) strategy, and a novel View Adaptive Regression (VAR) module to further refine hard-to-predict box properties. In addition, our framework seamlessly integrates semantic segmentation and panoptic segmentation tasks for the LiDAR point cloud, without extra modules. Among range-view-based methods, our model achieves new state-of-the-art detection performances on the Waymo Open Dataset. Especially, over 10 mAP improvement over convolutional counterparts can be obtained on the vehicle class. Our presented results for other tasks further reveal the multi-task capabilities of the proposed small but mighty framework.

Small, Versatile and Mighty: A Range-View Perception Framework

TL;DR

This paper tackles 3D perception from LiDAR by leveraging the range-view representation to achieve both efficiency and multi-task capability. It introduces a lightweight, fully convolutional framework called Small, Versatile, Mighty (SVM) that integrates Perspective Centric Label Assignment (PCLA) and View Adaptive Regression (VAR) to boost 3D detection while enabling semantic and panoptic segmentation without extra modules. The approach achieves state-of-the-art performance among range-view detectors on the Waymo Open Dataset, with notable gains for the vehicle class and strong segmentation results. These findings demonstrate the viability of range-view representations for real-time, multi-task LiDAR perception in autonomous driving.

Abstract

Despite its compactness and information integrity, the range view representation of LiDAR data rarely occurs as the first choice for 3D perception tasks. In this work, we further push the envelop of the range-view representation with a novel multi-task framework, achieving unprecedented 3D detection performances. Our proposed Small, Versatile, and Mighty (SVM) network utilizes a pure convolutional architecture to fully unleash the efficiency and multi-tasking potentials of the range view representation. To boost detection performances, we first propose a range-view specific Perspective Centric Label Assignment (PCLA) strategy, and a novel View Adaptive Regression (VAR) module to further refine hard-to-predict box properties. In addition, our framework seamlessly integrates semantic segmentation and panoptic segmentation tasks for the LiDAR point cloud, without extra modules. Among range-view-based methods, our model achieves new state-of-the-art detection performances on the Waymo Open Dataset. Especially, over 10 mAP improvement over convolutional counterparts can be obtained on the vehicle class. Our presented results for other tasks further reveal the multi-task capabilities of the proposed small but mighty framework.
Paper Structure (22 sections, 13 equations, 6 figures, 14 tables, 1 algorithm)

This paper contains 22 sections, 13 equations, 6 figures, 14 tables, 1 algorithm.

Figures (6)

  • Figure 1: Our range-view-based framework generates predictions including: Semantic class $s$ for each valid point, facilitating the task of semantic segmentation; Center-ness $c$, offsets $\Omega_y$, $\Omega_z$, and box height $h$ for each foreground point, with the potential to perform panoptic segmentation; Remaining elements for 3D object detection are regressed for centric points within boxes in the perspective view. Here, $\bm{ \times}$ and $\bullet$ in (d) represent directions inward and outward perpendicular to the plane, respectively.
  • Figure 2: Comparison of different label assignments used in range-view-based detectors. The top row presents the range-view results while the bottom row shows the corresponding point clouds and bounding boxes. (a) The raw range image and point clouds. (b) All points within the box are regarded as foregrounds duan2019centernettian2019fcoszhang2020bridging. (c) PPC chai2021point assigns foregrounds with a Gaussian function, which is based on the normalized 3D distances between points and the box centers. (d) We propose a Perspective Centric Label Assignment which is more suitable for the range view representation.
  • Figure 3: The proposed range-view-based perception system employs three branches after extracting deep features from a range image. The classification branch predicts semantic labels for all points, enabling the task of point cloud segmentation. Apart from center-ness scores from the classification branch, perspective-view regression further generates $\Omega_y$, $\Omega_z$, $h$, for valid object points. These results collectively contribute to the panoptic segmentation. In the final branch, bird's-eye-view regression is performed for the remaining elements. These elements are exclusively regressed for centric points within boxes, and complement the predictions of 3D detection boxes.
  • Figure 4: The loss curves of $\Omega_x, \Omega_y, \Omega_z$ when training a range-view-based object detector on the vehicle and pedestrian classes in Waymo Open Dataset sun2020scalability. In both scenarios, the $\Omega_y, \Omega_z$ exhibit obviously smaller losses than $\Omega_x$ at the end of the training.
  • Figure 5: Visualization of panoptic segmentation with our framework. (a) Points from an object distribute in a ray direction after adding the $\Omega_y, \Omega_z$ to their 3D coordinates. Stars highlight points with high center-ness scores. (b) Conventional clustering of offset points yields unsatisfactory results. (c) Performances are notably improved by the incorporation of center-ness information and a new distance metric.
  • ...and 1 more figures