CascadeV-Det: Cascade Point Voting for 3D Object Detection
Yingping Liang, Ying Fu
TL;DR
This work tackles the challenge of positive sampling and accurate regression in anchor-free 3D detectors when 3D points are distant from ground-truth centers. It proposes CascadeV-Det, a cascade voting detector with Instance Aware Voting (IA-Voting) for instance-aware feature updating and Cascade Positive Assignment (CPA) for progressively stricter training positives, complemented by optional image-feature fusion via Deformable Attention. The cascade decoder updates proposal points toward predicted centers and refines features across stages, with the positive threshold following $ oldsymbol{ hismu}_l = oldsymbol{ hismu}_{max} - rac{l}{L}(oldsymbol{ hismu}_{max}-oldsymbol{ hismu}_{min})$ and denoising guidance to stabilize training. On SUN RGB-D, CascadeV-Det achieves state-of-the-art results with mAP@0.25 of $70.4\%$ and mAP@0.5 of $51.6\%$, and shows competitive gains on ScanNet, demonstrating the effectiveness of cascade updating and cross-modal fusion for high-quality 3D object detection from point clouds.
Abstract
Anchor-free object detectors are highly efficient in performing point-based prediction without the need for extra post-processing of anchors. However, different from the 2D grids, the 3D points used in these detectors are often far from the ground truth center, making it challenging to accurately regress the bounding boxes. To address this issue, we propose a Cascade Voting (CascadeV) strategy that provides high-quality 3D object detection with point-based prediction. Specifically, CascadeV performs cascade detection using a novel Cascade Voting decoder that combines two new components: Instance Aware Voting (IA-Voting) and a Cascade Point Assignment (CPA) module. The IA-Voting module updates the object features of updated proposal points within the bounding box using conditional inverse distance weighting. This approach prevents features from being aggregated outside the instance and helps improve the accuracy of object detection. Additionally, since model training can suffer from a lack of proposal points with high centerness, we have developed the CPA module to narrow down the positive assignment threshold with cascade stages. This approach relaxes the dependence on proposal centerness in the early stages while ensuring an ample quantity of positives with high centerness in the later stages. Experiments show that FCAF3D with our CascadeV achieves state-of-the-art 3D object detection results with 70.4\% mAP@0.25 and 51.6\% mAP@0.5 on SUN RGB-D and competitive results on ScanNet. Code will be released at https://github.com/Sharpiless/CascadeV-Det
