CascadeV-Det: Cascade Point Voting for 3D Object Detection

Yingping Liang; Ying Fu

CascadeV-Det: Cascade Point Voting for 3D Object Detection

Yingping Liang, Ying Fu

TL;DR

This work tackles the challenge of positive sampling and accurate regression in anchor-free 3D detectors when 3D points are distant from ground-truth centers. It proposes CascadeV-Det, a cascade voting detector with Instance Aware Voting (IA-Voting) for instance-aware feature updating and Cascade Positive Assignment (CPA) for progressively stricter training positives, complemented by optional image-feature fusion via Deformable Attention. The cascade decoder updates proposal points toward predicted centers and refines features across stages, with the positive threshold following $ oldsymbol{ hismu}_l = oldsymbol{ hismu}_{max} - rac{l}{L}(oldsymbol{ hismu}_{max}-oldsymbol{ hismu}_{min})$ and denoising guidance to stabilize training. On SUN RGB-D, CascadeV-Det achieves state-of-the-art results with mAP@0.25 of $70.4\%$ and mAP@0.5 of $51.6\%$, and shows competitive gains on ScanNet, demonstrating the effectiveness of cascade updating and cross-modal fusion for high-quality 3D object detection from point clouds.

Abstract

Anchor-free object detectors are highly efficient in performing point-based prediction without the need for extra post-processing of anchors. However, different from the 2D grids, the 3D points used in these detectors are often far from the ground truth center, making it challenging to accurately regress the bounding boxes. To address this issue, we propose a Cascade Voting (CascadeV) strategy that provides high-quality 3D object detection with point-based prediction. Specifically, CascadeV performs cascade detection using a novel Cascade Voting decoder that combines two new components: Instance Aware Voting (IA-Voting) and a Cascade Point Assignment (CPA) module. The IA-Voting module updates the object features of updated proposal points within the bounding box using conditional inverse distance weighting. This approach prevents features from being aggregated outside the instance and helps improve the accuracy of object detection. Additionally, since model training can suffer from a lack of proposal points with high centerness, we have developed the CPA module to narrow down the positive assignment threshold with cascade stages. This approach relaxes the dependence on proposal centerness in the early stages while ensuring an ample quantity of positives with high centerness in the later stages. Experiments show that FCAF3D with our CascadeV achieves state-of-the-art 3D object detection results with 70.4\% mAP@0.25 and 51.6\% mAP@0.5 on SUN RGB-D and competitive results on ScanNet. Code will be released at https://github.com/Sharpiless/CascadeV-Det

CascadeV-Det: Cascade Point Voting for 3D Object Detection

TL;DR

and denoising guidance to stabilize training. On SUN RGB-D, CascadeV-Det achieves state-of-the-art results with mAP@0.25 of

and mAP@0.5 of

, and shows competitive gains on ScanNet, demonstrating the effectiveness of cascade updating and cross-modal fusion for high-quality 3D object detection from point clouds.

Abstract

Paper Structure (13 sections, 9 equations, 8 figures, 5 tables)

This paper contains 13 sections, 9 equations, 8 figures, 5 tables.

Introduction
Related Work
The Proposed CascadeV-Det
Formulation and Motivation
Instance-Aware Voting
Cascade Positive Assignment
Fusing Image Features
Experiments
Experimental Settings
Comparing with State-of-the-art Methods
Ablation Study
Qualitative Results and Discussion
Conclusion

Figures (8)

Figure 1: Points in red are the harmful object features outside the real target and the point colors from light to dark indicate the direction to vote. All points in the black ball of voting are assigned to derive the object features, resulting in aggregated features outside the real instance. Per-point prediction can also be hard to regress accurately when directly predicting bounding boxes with points far from the ground truth centers. Our method allows us to update the proposal points and object features in a cascade voting manner trained with abundant high quality positives. For clarity, we show the methods from a BEV perspective.
Figure 2: The framework of CascadeV-Det with a point encoder and a novel cascade voting decoder with IA-Voting modules. An extra CPA strategy is also used for training. Proposal points are first selected from the point encoder. Then the object features are updated by the IA-Voting module and fed into transformer layers with attention modules for feature refinement with per-stage predictions from the detection heads. The threshold for positives in the CPA strategy decreases stage by stage, providing stricter positive supervision with abundant high quality positives in the deeper stages.
Figure 3: The left shows the bounding box from a single proposal point with predicted $\boldsymbol{\delta}$. The right shows the updating process from the proposal point (gray) to the bounding box center (blue). And the center of the predicted bounding boxes are generally closer to the ground truth centers than the base proposal points.
Figure 4: The matched positive samples in the training process, in which the unmatched proposal points are represented by gray. The stars represent the denoising points (queries) with the minimum distance from the ground truth centers for denoising training. And the arrow direction indicates the updated location of the proposed point. The dashed box indicates the threshold range selected by the positive sample. Note that, unlike IA-Voting, this matching strategy only takes effect during training and is used to assign targets to proposal points.
Figure 5: (a) shows that the number of matched proposal points makes a sharp drop when directly the threshold for positives decreases from 0.5 (blue) to 0.4 (gold), which indicates that it is not feasible to reduce the threshold directly to remove noise proposal points due to the lack of positives for training. (b) shows in the first stage of our proposed CPA strategy, more points have a chance to be matched as positives than (a) due to $\mu>0.5$, and the centerness of these positives is further improved for excepted higher IoU in 2$^{nd}$ and 3$^{rd}$ stages with IA-voting modules. Thus we can perform decreasing $\mu$ to set up the stricter assignment with a sufficient number of high-centerness positives in the deeper stages.
...and 3 more figures

CascadeV-Det: Cascade Point Voting for 3D Object Detection

TL;DR

Abstract

CascadeV-Det: Cascade Point Voting for 3D Object Detection

Authors

TL;DR

Abstract

Table of Contents

Figures (8)