Table of Contents
Fetching ...

TetraSphere: A Neural Descriptor for O(3)-Invariant Point Cloud Analysis

Pavlo Melnyk, Andreas Robinson, Michael Felsberg, Mårten Wadenbäck

TL;DR

The results reveal the prac-tical value of steerable 3D spherical neurons for learning in 3D Euclidean space as well as beating all equivariant methods on randomly rotated synthetic data.

Abstract

In many practical applications, 3D point cloud analysis requires rotation invariance. In this paper, we present a learnable descriptor invariant under 3D rotations and reflections, i.e., the O(3) actions, utilizing the recently introduced steerable 3D spherical neurons and vector neurons. Specifically, we propose an embedding of the 3D spherical neurons into 4D vector neurons, which leverages end-to-end training of the model. In our approach, we perform TetraTransform--an equivariant embedding of the 3D input into 4D, constructed from the steerable neurons--and extract deeper O(3)-equivariant features using vector neurons. This integration of the TetraTransform into the VN-DGCNN framework, termed TetraSphere, negligibly increases the number of parameters by less than 0.0002%. TetraSphere sets a new state-of-the-art performance classifying randomly rotated real-world object scans of the challenging subsets of ScanObjectNN. Additionally, TetraSphere outperforms all equivariant methods on randomly rotated synthetic data: classifying objects from ModelNet40 and segmenting parts of the ShapeNet shapes. Thus, our results reveal the practical value of steerable 3D spherical neurons for learning in 3D Euclidean space. The code is available at https://github.com/pavlo-melnyk/tetrasphere.

TetraSphere: A Neural Descriptor for O(3)-Invariant Point Cloud Analysis

TL;DR

The results reveal the prac-tical value of steerable 3D spherical neurons for learning in 3D Euclidean space as well as beating all equivariant methods on randomly rotated synthetic data.

Abstract

In many practical applications, 3D point cloud analysis requires rotation invariance. In this paper, we present a learnable descriptor invariant under 3D rotations and reflections, i.e., the O(3) actions, utilizing the recently introduced steerable 3D spherical neurons and vector neurons. Specifically, we propose an embedding of the 3D spherical neurons into 4D vector neurons, which leverages end-to-end training of the model. In our approach, we perform TetraTransform--an equivariant embedding of the 3D input into 4D, constructed from the steerable neurons--and extract deeper O(3)-equivariant features using vector neurons. This integration of the TetraTransform into the VN-DGCNN framework, termed TetraSphere, negligibly increases the number of parameters by less than 0.0002%. TetraSphere sets a new state-of-the-art performance classifying randomly rotated real-world object scans of the challenging subsets of ScanObjectNN. Additionally, TetraSphere outperforms all equivariant methods on randomly rotated synthetic data: classifying objects from ModelNet40 and segmenting parts of the ShapeNet shapes. Thus, our results reveal the practical value of steerable 3D spherical neurons for learning in 3D Euclidean space. The code is available at https://github.com/pavlo-melnyk/tetrasphere.
Paper Structure (30 sections, 17 equations, 6 figures, 6 tables)

This paper contains 30 sections, 17 equations, 6 figures, 6 tables.

Figures (6)

  • Figure 1: Key component in our method (best viewed in color): a learnable O(3)-equivariant TetraTransform layer consisting of $K$ steerable 3D spherical neurons melnyk2022steerable that lifts the input 3D points to equivariant 4D representations (see Section \ref{['sec:so3-equiv_features']} for details).
  • Figure 2: High-level architecture of TetraSphere (for classification): the equivariant TT layer \ref{['eq:tetratransform']} is followed by pooling over $K$ steerable spherical neurons and the application of the equivariant VN-DGCNN deng2021vector, consisting of $d$ VN-layers $l_\textup{VN}$\ref{['eq:vn_mlp']}, and the block $l_{\textup{inv}}{(\,\cdot\,; \Theta, \Phi)}$\ref{['eq:tetrasphere_b']}, producing invariant features. The first (yellow) block contains the contributions of our work.
  • Figure 3: Examples of the objects from the hardest subset of ScanObjectNN uy-scanobjectnn-iccv19: chair, table, pillow, and display.
  • Figure 4: (Best viewed in color.) Top: Tetra-basis projection is the output of a steerable 3D spherical neuron melnyk2022steerable. Without loss of generality, consider one ($K=1$) steerable spherical neuron $B(\textbf{S})$ (see Section \ref{['sec:steerable_3d_neurons']}) with $\textbf{R}_{O} = \textbf{I}_5$, and the input point $\textbf{x}$ that happens to lie outside of the sphere $(\textbf{c}, r)$ with the learnable parameter vector $\textbf{S}$ (assume $\gamma=1$, and thus $\Tilde{\textbf{S}}=\textbf{S}$; see Section \ref{['sec:spherical_neurons']}) and its three rotated copies. Then the projection of $\textbf{x}$ in the tetra-basis $B(\textbf{S})$ is the vector $B(\textbf{S})\textbf{X}$ consisting of four scalar activations $\textbf{X}^\top \textbf{R}_{T_i} \textbf{S}$ of the respective spherical decision surfaces. Each activation determines the respective cathetus length, as per melnyk2020embed. Bottom: Vector neurons deng2021vector preserve the spatial dimension ($4$ in our case) and alter the latent dimension $C$ of the feature $\textbf{Y}$, see \ref{['eq:vn_linear']}.
  • Figure 5: Learned $\gamma$ parameters for TetraSphere$_{K=8}$ trained on the OBJ_BG subset of ScanObjectNN (see Table \ref{['tab:object_bg']}). All but $\gamma_{7}$ converge close to 0.
  • ...and 1 more figures