Table of Contents
Fetching ...

Learning Coordinate-based Convolutional Kernels for Continuous SE(3) Equivariant and Efficient Point Cloud Analysis

Jaein Kim, Hee Bin Yoo, Dong-Sig Han, Byoung-Tak Zhang

Abstract

A symmetry on rigid motion is one of the salient factors in efficient learning of 3D point cloud problems. Group convolution has been a representative method to extract equivariant features, but its realizations have struggled to retain both rigorous symmetry and scalability simultaneously. We advocate utilizing the intertwiner framework to resolve this trade-off, but previous works on it, which did not achieve complete SE(3) symmetry or scalability to large-scale problems, necessitate a more advanced kernel architecture. We present Equivariant Coordinate-based Kernel Convolution, or ECKConv. It acquires SE(3) equivariance from the kernel domain defined in a double coset space, and its explicit kernel design using coordinate-based networks enhances its learning capability and memory efficiency. The experiments on diverse point cloud tasks, e.g., classification, pose registration, part segmentation, and large-scale semantic segmentation, validate the rigid equivariance, memory scalability, and outstanding performance of ECKConv compared to state-of-the-art equivariant methods.

Learning Coordinate-based Convolutional Kernels for Continuous SE(3) Equivariant and Efficient Point Cloud Analysis

Abstract

A symmetry on rigid motion is one of the salient factors in efficient learning of 3D point cloud problems. Group convolution has been a representative method to extract equivariant features, but its realizations have struggled to retain both rigorous symmetry and scalability simultaneously. We advocate utilizing the intertwiner framework to resolve this trade-off, but previous works on it, which did not achieve complete SE(3) symmetry or scalability to large-scale problems, necessitate a more advanced kernel architecture. We present Equivariant Coordinate-based Kernel Convolution, or ECKConv. It acquires SE(3) equivariance from the kernel domain defined in a double coset space, and its explicit kernel design using coordinate-based networks enhances its learning capability and memory efficiency. The experiments on diverse point cloud tasks, e.g., classification, pose registration, part segmentation, and large-scale semantic segmentation, validate the rigid equivariance, memory scalability, and outstanding performance of ECKConv compared to state-of-the-art equivariant methods.
Paper Structure (35 sections, 1 theorem, 14 equations, 9 figures, 7 tables, 2 algorithms)

This paper contains 35 sections, 1 theorem, 14 equations, 9 figures, 7 tables, 2 algorithms.

Key Result

Proposition 4.1

Let $C_\text{in}$ be the dimension of input feature, $C_\text{out}$ be the dimension of output feature, $K$ be the cardinality of neighbors, and $A$ be the cardinality of anchor bases. Then the cost from the derivative by $\theta$, which is the parameter of $\omega$, reduces from $\mathcal{O}(A\,K\,

Figures (9)

  • Figure 1: On the left, random SE(3) transformations are applied on the identical object point cloud. The black dots are centroids and colored plus signs designate neighbor points, where they are identical points with different poses. If these point clouds are transformed as the reference points are met on the subgroup, or the SO(2) axis, it is guaranteed that points with the identical topology lie on the disjoint orbit, or double coset, around the axis. Therefore, the SE(3) equivariant operation is achievable by utilizing the unique parameter defining those orbits. We visualized this concept with an example drawn from the ModelNet40 wu20153d.
  • Figure 2: The implementation details in ECKConv. (a) The neighbor points within a radius $r$ around each centroid are sub-sampled by ball query qi2017pointnet++. After they are aligned by the inverse of section map as \ref{['eq:cse2conv_op']}, $\bar{\beta}_g$, $\bar{r}_g$, and $\bar{z}_g$, which correspond to the double coset parameters, are acquirable as depicted. These neighbor points are actually computed from the ModelNet40 wu20153d object with normal vectors. (b) The computation of explicit kernel in ECKConv. First, it maps a double coset parameter $\bar{x}_i=[\bar{\beta}_i, \bar{r}_i, \bar{z}_i]$ into Gaussian embedding $[\Psi(\bar{\beta}_i/\pi),\Psi(\bar{r}_i),\Psi((\bar{z}_i+1)/2)]$zheng2021rethinkingzheng2022trading. Then the embedding is projected by $F_\theta$ to $\omega(\bar{x};\theta)$, a coefficient vector with the dimension $A$. The kernel value of $\kappa(\cdot)$ is gained from the weighted summation of learnable bases $[\mathbf{W}_{j}]^{A}_{j=1}$ by $\omega(\bar{x}_{i};\theta)$, i.e., $\sum^{A}_{j=1}\omega_{j}(\bar{x}_{i};\theta)\mathbf{W}_j$.
  • Figure 3: Architectures of ECKConv and its residual connection. We apply batch normalizations and activations ($\sigma$) in the order shown in the figure. As we only visualize abstract flow of the ECKConv block and its residual block, please refer to \ref{['sec:supp_layer_detail']} in the supplementary material for more details.
  • Figure 4: Visualization of the part segmentation in the ShapeNet by ECKConv (Ours), Vector Neurons deng2021vector, and CSEConv kim2024continuous.
  • Figure 5: Semantic segmentation results by ECKConv on the Area 5 of S3DIS armeni20163d. The ceiling and wall points that obscure the view point are omitted in the figures for the better visualization. (Left) The input, ground truth label, and prediction results of office 28, lobby 1, and conference room 2 in the Area 5. Various indoor objects, e.g., bookcase, table, chair, and sofa, are successfully segmented from the scenes by our method. (Right) The simulation by randomly placing chair objects from ModelNet40 wu20153d in the hallway 8 of Area 5. Even though the chair objects are out-of-distribution from the S3DIS, our method segments every object from the simulated scenes.
  • ...and 4 more figures

Theorems & Definitions (2)

  • Proposition 4.1
  • proof