Table of Contents
Fetching ...

Cross-Cluster Shifting for Efficient and Effective 3D Object Detection in Autonomous Driving

Zhili Chen, Kien T. Pham, Maosheng Ye, Zhiqiang Shen, Qifeng Chen

TL;DR

The paper addresses non-local information loss in 3D point-based detectors due to downsampling. It introduces Shift-SSD, featuring Cross-Cluster Shifting to enable long-range inter-cluster interactions by exchanging partial channel features across neighboring ball regions, thereby expanding receptive fields with minimal overhead. The approach achieves state-of-the-art performance among point-based detectors on KITTI, Waymo, and nuScenes, with competitive runtime. This work provides a practical and scalable mechanism for enhancing geometric feature propagation in sparse 3D data for autonomous driving perception, and it may inspire future cross-cluster information exchange techniques in 3D vision.

Abstract

We present a new 3D point-based detector model, named Shift-SSD, for precise 3D object detection in autonomous driving. Traditional point-based 3D object detectors often employ architectures that rely on a progressive downsampling of points. While this method effectively reduces computational demands and increases receptive fields, it will compromise the preservation of crucial non-local information for accurate 3D object detection, especially in the complex driving scenarios. To address this, we introduce an intriguing Cross-Cluster Shifting operation to unleash the representation capacity of the point-based detector by efficiently modeling longer-range inter-dependency while including only a negligible overhead. Concretely, the Cross-Cluster Shifting operation enhances the conventional design by shifting partial channels from neighboring clusters, which enables richer interaction with non-local regions and thus enlarges the receptive field of clusters. We conduct extensive experiments on the KITTI, Waymo, and nuScenes datasets, and the results demonstrate the state-of-the-art performance of Shift-SSD in both detection accuracy and runtime efficiency.

Cross-Cluster Shifting for Efficient and Effective 3D Object Detection in Autonomous Driving

TL;DR

The paper addresses non-local information loss in 3D point-based detectors due to downsampling. It introduces Shift-SSD, featuring Cross-Cluster Shifting to enable long-range inter-cluster interactions by exchanging partial channel features across neighboring ball regions, thereby expanding receptive fields with minimal overhead. The approach achieves state-of-the-art performance among point-based detectors on KITTI, Waymo, and nuScenes, with competitive runtime. This work provides a practical and scalable mechanism for enhancing geometric feature propagation in sparse 3D data for autonomous driving perception, and it may inspire future cross-cluster information exchange techniques in 3D vision.

Abstract

We present a new 3D point-based detector model, named Shift-SSD, for precise 3D object detection in autonomous driving. Traditional point-based 3D object detectors often employ architectures that rely on a progressive downsampling of points. While this method effectively reduces computational demands and increases receptive fields, it will compromise the preservation of crucial non-local information for accurate 3D object detection, especially in the complex driving scenarios. To address this, we introduce an intriguing Cross-Cluster Shifting operation to unleash the representation capacity of the point-based detector by efficiently modeling longer-range inter-dependency while including only a negligible overhead. Concretely, the Cross-Cluster Shifting operation enhances the conventional design by shifting partial channels from neighboring clusters, which enables richer interaction with non-local regions and thus enlarges the receptive field of clusters. We conduct extensive experiments on the KITTI, Waymo, and nuScenes datasets, and the results demonstrate the state-of-the-art performance of Shift-SSD in both detection accuracy and runtime efficiency.
Paper Structure (10 sections, 5 equations, 4 figures, 7 tables)

This paper contains 10 sections, 5 equations, 4 figures, 7 tables.

Figures (4)

  • Figure 1: 3D point-based object detector commonly process point cloud data by first grouping points (denoted as orange) around the selected cluster points (denoted as red) and then summarizing the local points' geometric patterns into the cluster points' features. Our proposed Shift-SSD builds interactions among the independently learned ball regions via Cross-Cluster Shifting. Shifting partial channels of the extracted features from blue to red leads to better intra-instance learning, and from blue to green resulting in more discriminative cross-instance learning.
  • Figure 2: The upper part of the figure presents the overall model architecture of the Shift-SSD and the detailed design of our SSA module. Shift-SSD comprises the Backbone Network and the Box Prediction Network. The Backbone Network takes raw point clouds as input and then conducts downsampling with a stack of our proposed SSA modules to summarize representative features into a point subset. As illustrated in the lower part of the figure, each SSA module applies Cluster Point Selection, Ball Grouping, and Set Featrure Abstraction to summarize local region features into cluster points. Followed by our proposed Cross-Cluster Shifting, it enhances the features by exchanging information among independently learned ball regions. The following Box Prediction Network first predicts offsets to shift cluster points towards instance centers with the Vote Layer qi2019deep, later using a Set Abstraction Layer to aggregate features. Lastly, the aggregated features are fed to the prediction head to generate bounding boxes with class labels.
  • Figure 3: The pipeline of the Cross-Cluster Shifting. The cluster features in the center is in red. As shown on the left of the figure, we first utilize the farthest neighbor sampling to obtain its farthest neighbor in blue within the range of $r^\prime$. Then, Cross-Cluster Shifting is conducted to exchange partial features from the farthest neighbor to the cluster features. The resulting fused features in yellow are obtained by passing through two Conv layers, followed by an average pooling operation.
  • Figure 4: Qualitative results achieved by Shift-SSD on the validation set of the Waymo Open Dataset sun2020scalability. Note that the Ground-truth bounding boxes are shown in red, the detected Vehicles' are in yellow, the Pedestrians' are in green, and the Cyclists' are in cyan.