Table of Contents
Fetching ...

RISurConv: Rotation Invariant Surface Attention-Augmented Convolutions for 3D Point Cloud Classification and Segmentation

Zhiyuan Zhang, Licheng Yang, Zhiyu Xiang

TL;DR

This work tackles the challenge of rotation invariance in 3D point cloud classification and segmentation by introducing RISurConv, an attention-augmented convolution that operates on Rotation Invariant Surface Properties (RISP) computed from local dual-triangle surfaces. The method formulates RISurConv as f(Ω) = SA( σ( 𝒜( { 𝒯(f_{x_i}) } ) ) ) with 𝒯(f_{x_i}) = w_i · f_{x_i}, ensuring rotation-invariant feature learning through two self-attention layers. The authors define RISP as a 14-dimensional descriptor that encodes distances, angular relations, and normal directions for two adjacent triangles, and prove its completeness for describing local surface geometry. Empirically, RISurConv achieves state-of-the-art results on ModelNet40, ScanObjectNN, FG3D, and ShapeNet across rotation scenarios, significantly narrowing or surpassing the gap with non-rotation-invariant methods while maintaining rotational robustness. The approach demonstrates strong generalization to real-world data, provides detailed ablations to justify the design choices, and offers a practical rotation-invariant alternative for 3D point cloud analysis with potential impact on robotics and AR/VR applications.

Abstract

Despite the progress on 3D point cloud deep learning, most prior works focus on learning features that are invariant to translation and point permutation, and very limited efforts have been devoted for rotation invariant property. Several recent studies achieve rotation invariance at the cost of lower accuracies. In this work, we close this gap by proposing a novel yet effective rotation invariant architecture for 3D point cloud classification and segmentation. Instead of traditional pointwise operations, we construct local triangle surfaces to capture more detailed surface structure, based on which we can extract highly expressive rotation invariant surface properties which are then integrated into an attention-augmented convolution operator named RISurConv to generate refined attention features via self-attention layers. Based on RISurConv we build an effective neural network for 3D point cloud analysis that is invariant to arbitrary rotations while maintaining high accuracy. We verify the performance on various benchmarks with supreme results obtained surpassing the previous state-of-the-art by a large margin. We achieve an overall accuracy of 96.0% (+4.7%) on ModelNet40, 93.1% (+12.8%) on ScanObjectNN, and class accuracies of 91.5% (+3.6%), 82.7% (+5.1%), and 78.5% (+9.2%) on the three categories of the FG3D dataset for the fine-grained classification task. Additionally, we achieve 81.5% (+1.0%) mIoU on ShapeNet for the segmentation task. Code is available here: https://github.com/cszyzhang/RISurConv

RISurConv: Rotation Invariant Surface Attention-Augmented Convolutions for 3D Point Cloud Classification and Segmentation

TL;DR

This work tackles the challenge of rotation invariance in 3D point cloud classification and segmentation by introducing RISurConv, an attention-augmented convolution that operates on Rotation Invariant Surface Properties (RISP) computed from local dual-triangle surfaces. The method formulates RISurConv as f(Ω) = SA( σ( 𝒜( { 𝒯(f_{x_i}) } ) ) ) with 𝒯(f_{x_i}) = w_i · f_{x_i}, ensuring rotation-invariant feature learning through two self-attention layers. The authors define RISP as a 14-dimensional descriptor that encodes distances, angular relations, and normal directions for two adjacent triangles, and prove its completeness for describing local surface geometry. Empirically, RISurConv achieves state-of-the-art results on ModelNet40, ScanObjectNN, FG3D, and ShapeNet across rotation scenarios, significantly narrowing or surpassing the gap with non-rotation-invariant methods while maintaining rotational robustness. The approach demonstrates strong generalization to real-world data, provides detailed ablations to justify the design choices, and offers a practical rotation-invariant alternative for 3D point cloud analysis with potential impact on robotics and AR/VR applications.

Abstract

Despite the progress on 3D point cloud deep learning, most prior works focus on learning features that are invariant to translation and point permutation, and very limited efforts have been devoted for rotation invariant property. Several recent studies achieve rotation invariance at the cost of lower accuracies. In this work, we close this gap by proposing a novel yet effective rotation invariant architecture for 3D point cloud classification and segmentation. Instead of traditional pointwise operations, we construct local triangle surfaces to capture more detailed surface structure, based on which we can extract highly expressive rotation invariant surface properties which are then integrated into an attention-augmented convolution operator named RISurConv to generate refined attention features via self-attention layers. Based on RISurConv we build an effective neural network for 3D point cloud analysis that is invariant to arbitrary rotations while maintaining high accuracy. We verify the performance on various benchmarks with supreme results obtained surpassing the previous state-of-the-art by a large margin. We achieve an overall accuracy of 96.0% (+4.7%) on ModelNet40, 93.1% (+12.8%) on ScanObjectNN, and class accuracies of 91.5% (+3.6%), 82.7% (+5.1%), and 78.5% (+9.2%) on the three categories of the FG3D dataset for the fine-grained classification task. Additionally, we achieve 81.5% (+1.0%) mIoU on ShapeNet for the segmentation task. Code is available here: https://github.com/cszyzhang/RISurConv
Paper Structure (21 sections, 16 equations, 8 figures, 10 tables, 1 algorithm)

This paper contains 21 sections, 16 equations, 8 figures, 10 tables, 1 algorithm.

Figures (8)

  • Figure 1: Rotation Invariant Surface Property (RISP) construction: Given a point $p$ as the reference point, $K$ ($K = 8$ in this example) nearest points are selected (middle). For each neighbor $x_{i}$, two adjacent neighbors $x_{i-1}$ and $x_{i+1}$ are used to form two triangular local surfaces (right), based on which rotation invariant properties are constructed.
  • Figure 2: RISurConv operator. For a local point set with $p$ as the reference (red), K nearest neighbors are labelled as blue. Then, we compute the Rotation Invariant Surface Properties at each neighbor by constructing local dual triangle surfaces (Section \ref{['sec:risp']}), which is embedded to a high-dimensional space by a shared multi-layer perceptron (MLP) followed by a self-attention layer to produce refined features. Concatenated with previous layer features (if any), the features of these local points are further passed to MLPs, which are then summarized by maxpooling. To further refine the features, another self-attention layer follows.
  • Figure 3: Our neural network architecture comprises five RISurConv layers to extract rotation invariant features followed by a Transformer Encoder to enhance the learnt features before fully connected layers for object classification. We add a decoder with skip connections for segmentation task.
  • Figure 4: Qualitative comparisons (Red indicates wrong).
  • Figure 5: Histogram comparison for normalized feature values without and with self-attention layers.
  • ...and 3 more figures