Table of Contents
Fetching ...

Optimized CNNs for Rapid 3D Point Cloud Object Recognition

Tianyi Lyu, Dian Gu, Peiyuan Chen, Yaoting Jiang, Zhenhong Zhang, Huadong Pang, Li Zhou, Yiping Dong

TL;DR

The paper tackles rapid object detection in large-scale 3D point clouds, addressing the computational burden of dense 3D CNNs by introducing sparse convolutional networks driven by a feature-centric voting mechanism and an $L_1$ penalty on activations. It integrates preprocessing, 3D FPFH features, and multi-view 2D features through a sparse graph-CNN and an MLP-based anomaly scorer, achieving state-of-the-art performance on the MVTec 3D-AD dataset with an I-ROC of 95.15% and a P-PRO of 92.93% while preserving real-time speeds. The contributions include a complete end-to-end pipeline, explicit sparsity through $L_1$ regularization, and comprehensive ablations validating view-number, feature fusion, and backbone choices. This work demonstrates the practicality of sparse, voting-based architectures for real-time 3D perception in robotics and autonomous driving, and points to GPU-accelerated voting as a promising direction for further gains.

Abstract

This study introduces a method for efficiently detecting objects within 3D point clouds using convolutional neural networks (CNNs). Our approach adopts a unique feature-centric voting mechanism to construct convolutional layers that capitalize on the typical sparsity observed in input data. We explore the trade-off between accuracy and speed across diverse network architectures and advocate for integrating an $\mathcal{L}_1$ penalty on filter activations to augment sparsity within intermediate layers. This research pioneers the proposal of sparse convolutional layers combined with $\mathcal{L}_1$ regularization to effectively handle large-scale 3D data processing. Our method's efficacy is demonstrated on the MVTec 3D-AD object detection benchmark. The Vote3Deep models, with just three layers, outperform the previous state-of-the-art in both laser-only approaches and combined laser-vision methods. Additionally, they maintain competitive processing speeds. This underscores our approach's capability to substantially enhance detection performance while ensuring computational efficiency suitable for real-time applications.

Optimized CNNs for Rapid 3D Point Cloud Object Recognition

TL;DR

The paper tackles rapid object detection in large-scale 3D point clouds, addressing the computational burden of dense 3D CNNs by introducing sparse convolutional networks driven by a feature-centric voting mechanism and an penalty on activations. It integrates preprocessing, 3D FPFH features, and multi-view 2D features through a sparse graph-CNN and an MLP-based anomaly scorer, achieving state-of-the-art performance on the MVTec 3D-AD dataset with an I-ROC of 95.15% and a P-PRO of 92.93% while preserving real-time speeds. The contributions include a complete end-to-end pipeline, explicit sparsity through regularization, and comprehensive ablations validating view-number, feature fusion, and backbone choices. This work demonstrates the practicality of sparse, voting-based architectures for real-time 3D perception in robotics and autonomous driving, and points to GPU-accelerated voting as a promising direction for further gains.

Abstract

This study introduces a method for efficiently detecting objects within 3D point clouds using convolutional neural networks (CNNs). Our approach adopts a unique feature-centric voting mechanism to construct convolutional layers that capitalize on the typical sparsity observed in input data. We explore the trade-off between accuracy and speed across diverse network architectures and advocate for integrating an penalty on filter activations to augment sparsity within intermediate layers. This research pioneers the proposal of sparse convolutional layers combined with regularization to effectively handle large-scale 3D data processing. Our method's efficacy is demonstrated on the MVTec 3D-AD object detection benchmark. The Vote3Deep models, with just three layers, outperform the previous state-of-the-art in both laser-only approaches and combined laser-vision methods. Additionally, they maintain competitive processing speeds. This underscores our approach's capability to substantially enhance detection performance while ensuring computational efficiency suitable for real-time applications.

Paper Structure

This paper contains 18 sections, 18 equations, 6 figures, 3 tables.

Figures (6)

  • Figure 1: Our architecture.
  • Figure 2: Visualization of prediction results.
  • Figure 3: Comparisons of the anomaly detection performances under different types of features, views, and backbones.
  • Figure 4: Samples for the influence of different views.
  • Figure 5: Visualization for feature distributions.
  • ...and 1 more figures