Table of Contents
Fetching ...

SGCCNet: Single-Stage 3D Object Detector With Saliency-Guided Data Augmentation and Confidence Correction Mechanism

Ao Liang, Wenyu Chen, Jian Fang, Huaici Zhao

TL;DR

SGCCNet targets two core issues in single-stage point-based 3D detectors: ILQ, addressed via Saliency-Guided Data Augmentation (SGDA) that drops salient points to force exploration of low-saliency regions, and MLC, tackled with a Confidence Correction Mechanism (CCM) that calibrates proposal confidence using neighboring vote-point predictions. The backbone incorporates a Geometric Normalization Module (GNM) and Skip Connection Block (SCB) to curb internal covariate shift and feature forgetting, while end-to-end training combines semantic, vote, and IoU-informed losses to align localization and confidence. On KITTI, SGCCNet sets a new benchmark among point-based detectors (e.g., $AP_{3D}^{40}$ of $80.82$ for Car Moderate on test) and remains portable to other architectures like 3DSSD and SASA, with favorable runtime. Overall, the methods yield robust learning for low-quality targets and calibrated confidence for better NMS decision-making, offering significant practical impact for real-time LiDAR perception in autonomous systems.

Abstract

The single-stage point-based 3D object detectors have attracted widespread research interest due to their advantages of lightweight and fast inference speed. However, they still face challenges such as inadequate learning of low-quality objects (ILQ) and misalignment between localization accuracy and classification confidence (MLC). In this paper, we propose SGCCNet to alleviate these two issues. For ILQ, SGCCNet adopts a Saliency-Guided Data Augmentation (SGDA) strategy to enhance the robustness of the model on low-quality objects by reducing its reliance on salient features. Specifically, We construct a classification task and then approximate the saliency scores of points by moving points towards the point cloud centroid in a differentiable process. During the training process, SGCCNet will be forced to learn from low saliency features through dropping points. Meanwhile, to avoid internal covariate shift and contextual features forgetting caused by dropping points, we add a geometric normalization module and skip connection block in each stage. For MLC, we design a Confidence Correction Mechanism (CCM) specifically for point-based multi-class detectors. This mechanism corrects the confidence of the current proposal by utilizing the predictions of other key points within the local region in the post-processing stage. Extensive experiments on the KITTI dataset demonstrate the generality and effectiveness of our SGCCNet. On the KITTI \textit{test} set, SGCCNet achieves $80.82\%$ for the metric of $AP_{3D}$ on the \textit{Moderate} level, outperforming all other point-based detectors, surpassing IA-SSD and Fast Point R-CNN by $2.35\%$ and $3.42\%$, respectively. Additionally, SGCCNet demonstrates excellent portability for other point-based detectors

SGCCNet: Single-Stage 3D Object Detector With Saliency-Guided Data Augmentation and Confidence Correction Mechanism

TL;DR

SGCCNet targets two core issues in single-stage point-based 3D detectors: ILQ, addressed via Saliency-Guided Data Augmentation (SGDA) that drops salient points to force exploration of low-saliency regions, and MLC, tackled with a Confidence Correction Mechanism (CCM) that calibrates proposal confidence using neighboring vote-point predictions. The backbone incorporates a Geometric Normalization Module (GNM) and Skip Connection Block (SCB) to curb internal covariate shift and feature forgetting, while end-to-end training combines semantic, vote, and IoU-informed losses to align localization and confidence. On KITTI, SGCCNet sets a new benchmark among point-based detectors (e.g., of for Car Moderate on test) and remains portable to other architectures like 3DSSD and SASA, with favorable runtime. Overall, the methods yield robust learning for low-quality targets and calibrated confidence for better NMS decision-making, offering significant practical impact for real-time LiDAR perception in autonomous systems.

Abstract

The single-stage point-based 3D object detectors have attracted widespread research interest due to their advantages of lightweight and fast inference speed. However, they still face challenges such as inadequate learning of low-quality objects (ILQ) and misalignment between localization accuracy and classification confidence (MLC). In this paper, we propose SGCCNet to alleviate these two issues. For ILQ, SGCCNet adopts a Saliency-Guided Data Augmentation (SGDA) strategy to enhance the robustness of the model on low-quality objects by reducing its reliance on salient features. Specifically, We construct a classification task and then approximate the saliency scores of points by moving points towards the point cloud centroid in a differentiable process. During the training process, SGCCNet will be forced to learn from low saliency features through dropping points. Meanwhile, to avoid internal covariate shift and contextual features forgetting caused by dropping points, we add a geometric normalization module and skip connection block in each stage. For MLC, we design a Confidence Correction Mechanism (CCM) specifically for point-based multi-class detectors. This mechanism corrects the confidence of the current proposal by utilizing the predictions of other key points within the local region in the post-processing stage. Extensive experiments on the KITTI dataset demonstrate the generality and effectiveness of our SGCCNet. On the KITTI \textit{test} set, SGCCNet achieves for the metric of on the \textit{Moderate} level, outperforming all other point-based detectors, surpassing IA-SSD and Fast Point R-CNN by and , respectively. Additionally, SGCCNet demonstrates excellent portability for other point-based detectors
Paper Structure (20 sections, 14 equations, 16 figures, 9 tables, 2 algorithms)

This paper contains 20 sections, 14 equations, 16 figures, 9 tables, 2 algorithms.

Figures (16)

  • Figure 1: Visualize the saliency of three classes of objects in KITTI. The model's reliance on highly salient features is detrimental to the detection of low-quality objects.
  • Figure 2: Three typical scenarios of MLC in point-based single-stage 3D detectors. (a) False positive targets. (b) Suboptimal predicted boxes. (c) Missed accurately located targets.
  • Figure 3: Overview of proposed SGCCNet. SGCCNet adopts a PointNet++-style 3D backbone to learn point cloud features. In addition, SGCCNet consists of three core components, namely a saliency-guided data augmentation strategy, SA layer with geometric normalization modules and skip connection blocks, and a confidence correction mechanism during post-processing.
  • Figure 4: (a) Overview of proposed SGCCNet-elite for classification task. (b) Shifting the point towards the centroid is similar to discarding the point, and the movement process is differentiable, which can be used to approximate the saliency score of the point.
  • Figure 5: Mark the changes in the scenes before and after dropping points, with points representing the Car, Pedestrian, Cyclist, and Background classes in green, yellow, red, and blue respectively.
  • ...and 11 more figures