GraVoS: Voxel Selection for 3D Point-Cloud Detection

Oren Shrout; Yizhak Ben-Shabat; Ayellet Tal

GraVoS: Voxel Selection for 3D Point-Cloud Detection

Oren Shrout, Yizhak Ben-Shabat, Ayellet Tal

TL;DR

This work proposes to modify the scenes by removing elements (voxels) rather than adding ones, in a manner that addresses both types of dataset imbalance and class imbalance in 3D object detection.

Abstract

3D object detection within large 3D scenes is challenging not only due to the sparsity and irregularity of 3D point clouds, but also due to both the extreme foreground-background scene imbalance and class imbalance. A common approach is to add ground-truth objects from other scenes. Differently, we propose to modify the scenes by removing elements (voxels), rather than adding ones. Our approach selects the "meaningful" voxels, in a manner that addresses both types of dataset imbalance. The approach is general and can be applied to any voxel-based detector, yet the meaningfulness of a voxel is network-dependent. Our voxel selection is shown to improve the performance of several prominent 3D detection methods.

GraVoS: Voxel Selection for 3D Point-Cloud Detection

TL;DR

This work proposes to modify the scenes by removing elements (voxels) rather than adding ones, in a manner that addresses both types of dataset imbalance and class imbalance in 3D object detection.

Abstract

Paper Structure (7 sections, 4 equations, 6 figures, 4 tables)

This paper contains 7 sections, 4 equations, 6 figures, 4 tables.

Introduction
Related Work
Gradient-based Voxel Selection (GraVoS)
Experiments
Results
Ablation Study
Conclusion

Figures (6)

Figure 1: Training with GraVoS. An input point cloud (cyan) is voxelized and fed into a pre-trained voxel-based detector at two different training stages, early and late (with frozen weights). These detectors' losses are computed and are the input of GraVoS, which performs voxel selection. The selected voxels (salmon) are then fed into the late detector, initializing its weights where it left off ($\theta_l$) and continuing training using the selected voxels exclusively. Here, $f(\cdot)$ is a voxel-based network.
Figure 2: GraVoS Module. The voxelized point cloud is fed into the GraVoS module and the pre-trained detector (at two training stages). The detectors' losses are computed and fed into the GraVoS module. These losses are used to compute the gradient magnitude at each voxel. For each detector stage the voxels are selected based on their gradients' magnitude. The selected voxels (highlighted in salmon) from the two stages are then merged to form the final selected subset of voxels, $S^{mf}$.
Figure 3: Gradient-based voxel selection. Given an input point cloud (a), the magnitudes of the gradients are computed (b), and the selected voxel subset is computed (c). The magnitude of the gradients is depicted as a colormap from blue to red (low to high values). Evidently, the gradients on the background voxels are lower and are therefore less likely to be selected than foreground pixels. The objects' voxels have high gradients and therefore most of their voxels are retained in the final subset. However, there are differences between the classes: The less prominent classes, Cyclist (middle) and Pedestrian (bottom) retain relatively more points than the prominent class Car (top).
Figure 4: Incorporating GraVoS into two-stage detectors. Voxel selection is performed during the first stage, as before. Since in two-stage architectures, the detector consists of a proposal generator and a refinement module, we use the detector without the refinement component in the first stage. The last refinement stage (in the second stage) gets the output of the proposal generator (the green arrow), as well as the required local data that is bypassed through GraVoS.
Figure 5: Comparison to alternative approaches. GraVoS is compared to Dropout, BgSampling and InvFreqSampling for different voxel selection ratios. The baseline is the constant performance of the detector (all voxels). When too few voxels are used ($<0.7$), the detector misses objects, as expected. For ratios larger than $0.7$, GraVoS outperforms other approaches significantly. This is due to the fact that we use the meaningful voxels.
...and 1 more figures

GraVoS: Voxel Selection for 3D Point-Cloud Detection

TL;DR

Abstract

GraVoS: Voxel Selection for 3D Point-Cloud Detection

Authors

TL;DR

Abstract

Table of Contents

Figures (6)