Optimized CNNs for Rapid 3D Point Cloud Object Recognition
Tianyi Lyu, Dian Gu, Peiyuan Chen, Yaoting Jiang, Zhenhong Zhang, Huadong Pang, Li Zhou, Yiping Dong
TL;DR
The paper tackles rapid object detection in large-scale 3D point clouds, addressing the computational burden of dense 3D CNNs by introducing sparse convolutional networks driven by a feature-centric voting mechanism and an $L_1$ penalty on activations. It integrates preprocessing, 3D FPFH features, and multi-view 2D features through a sparse graph-CNN and an MLP-based anomaly scorer, achieving state-of-the-art performance on the MVTec 3D-AD dataset with an I-ROC of 95.15% and a P-PRO of 92.93% while preserving real-time speeds. The contributions include a complete end-to-end pipeline, explicit sparsity through $L_1$ regularization, and comprehensive ablations validating view-number, feature fusion, and backbone choices. This work demonstrates the practicality of sparse, voting-based architectures for real-time 3D perception in robotics and autonomous driving, and points to GPU-accelerated voting as a promising direction for further gains.
Abstract
This study introduces a method for efficiently detecting objects within 3D point clouds using convolutional neural networks (CNNs). Our approach adopts a unique feature-centric voting mechanism to construct convolutional layers that capitalize on the typical sparsity observed in input data. We explore the trade-off between accuracy and speed across diverse network architectures and advocate for integrating an $\mathcal{L}_1$ penalty on filter activations to augment sparsity within intermediate layers. This research pioneers the proposal of sparse convolutional layers combined with $\mathcal{L}_1$ regularization to effectively handle large-scale 3D data processing. Our method's efficacy is demonstrated on the MVTec 3D-AD object detection benchmark. The Vote3Deep models, with just three layers, outperform the previous state-of-the-art in both laser-only approaches and combined laser-vision methods. Additionally, they maintain competitive processing speeds. This underscores our approach's capability to substantially enhance detection performance while ensuring computational efficiency suitable for real-time applications.
