Table of Contents
Fetching ...

Hierarchical Direction Perception via Atomic Dot-Product Operators for Rotation-Invariant Point Clouds Learning

Chenyu Hu, Xiaotong Li, Hao Zhu, Biao Hou

TL;DR

This work tackles the sensitivity of point-cloud representations to arbitrary 3D rotations by introducing DiPVNet, a direction-aware framework built on atomic dot-product operators. It jointly learns local directional cues via Learnable Local Dot-Product (L2DP) and global directional structure through the Direction-Aware Spherical Fourier Transform (DASFT), with cross-attention fusing invariant and canonical-projected equivariant features. The approach provides rotation invariance and adaptive directional perception across multiple scales, validated by state-of-the-art results on rotation-robust classification and segmentation benchmarks. Its combination of local invariants and global spectral cues offers a practical pathway to robust 3D perception under challenging pose variations, with potential applicability beyond point clouds to other geometric data modalities.

Abstract

Point cloud processing has become a cornerstone technology in many 3D vision tasks. However, arbitrary rotations introduce variations in point cloud orientations, posing a long-standing challenge for effective representation learning. The core of this issue is the disruption of the point cloud's intrinsic directional characteristics caused by rotational perturbations. Recent methods attempt to implicitly model rotational equivariance and invariance, preserving directional information and propagating it into deep semantic spaces. Yet, they often fall short of fully exploiting the multiscale directional nature of point clouds to enhance feature representations. To address this, we propose the Direction-Perceptive Vector Network (DiPVNet). At its core is an atomic dot-product operator that simultaneously encodes directional selectivity and rotation invariance--endowing the network with both rotational symmetry modeling and adaptive directional perception. At the local level, we introduce a Learnable Local Dot-Product (L2DP) Operator, which enables interactions between a center point and its neighbors to adaptively capture the non-uniform local structures of point clouds. At the global level, we leverage generalized harmonic analysis to prove that the dot-product between point clouds and spherical sampling vectors is equivalent to a direction-aware spherical Fourier transform (DASFT). This leads to the construction of a global directional response spectrum for modeling holistic directional structures. We rigorously prove the rotation invariance of both operators. Extensive experiments on challenging scenarios involving noise and large-angle rotations demonstrate that DiPVNet achieves state-of-the-art performance on point cloud classification and segmentation tasks. Our code is available at https://github.com/wxszreal0/DiPVNet.

Hierarchical Direction Perception via Atomic Dot-Product Operators for Rotation-Invariant Point Clouds Learning

TL;DR

This work tackles the sensitivity of point-cloud representations to arbitrary 3D rotations by introducing DiPVNet, a direction-aware framework built on atomic dot-product operators. It jointly learns local directional cues via Learnable Local Dot-Product (L2DP) and global directional structure through the Direction-Aware Spherical Fourier Transform (DASFT), with cross-attention fusing invariant and canonical-projected equivariant features. The approach provides rotation invariance and adaptive directional perception across multiple scales, validated by state-of-the-art results on rotation-robust classification and segmentation benchmarks. Its combination of local invariants and global spectral cues offers a practical pathway to robust 3D perception under challenging pose variations, with potential applicability beyond point clouds to other geometric data modalities.

Abstract

Point cloud processing has become a cornerstone technology in many 3D vision tasks. However, arbitrary rotations introduce variations in point cloud orientations, posing a long-standing challenge for effective representation learning. The core of this issue is the disruption of the point cloud's intrinsic directional characteristics caused by rotational perturbations. Recent methods attempt to implicitly model rotational equivariance and invariance, preserving directional information and propagating it into deep semantic spaces. Yet, they often fall short of fully exploiting the multiscale directional nature of point clouds to enhance feature representations. To address this, we propose the Direction-Perceptive Vector Network (DiPVNet). At its core is an atomic dot-product operator that simultaneously encodes directional selectivity and rotation invariance--endowing the network with both rotational symmetry modeling and adaptive directional perception. At the local level, we introduce a Learnable Local Dot-Product (L2DP) Operator, which enables interactions between a center point and its neighbors to adaptively capture the non-uniform local structures of point clouds. At the global level, we leverage generalized harmonic analysis to prove that the dot-product between point clouds and spherical sampling vectors is equivalent to a direction-aware spherical Fourier transform (DASFT). This leads to the construction of a global directional response spectrum for modeling holistic directional structures. We rigorously prove the rotation invariance of both operators. Extensive experiments on challenging scenarios involving noise and large-angle rotations demonstrate that DiPVNet achieves state-of-the-art performance on point cloud classification and segmentation tasks. Our code is available at https://github.com/wxszreal0/DiPVNet.

Paper Structure

This paper contains 36 sections, 62 equations, 6 figures, 7 tables.

Figures (6)

  • Figure 1: We extract direction-sensitive local features via the L2DP operator; concurrently, a global directional response spectrum is constructed through the DASFT module, capturing the directional characteristics of the overall structure.
  • Figure 2: DiPVNet single-layer architecture. Point cloud features $\mathcal{P}$ are transformed into graph features $\mathcal{G}$ via KNN graph construction. In the DiPVNet layer, the VNN Block models rotation equivariance, while concurrently the L2DP operator processes graph features through $\Phi_L$ and extracts local directional features via aggregation mapping $\varphi(\cdot,G(v))$, and the DASFT module constructs a global directional response spectrum through the dot-product operator $\Phi_G$ between point clouds and spherical sampling vectors. Local and global features are fused via cross-attention mechanism, with the output discriminative directional features utilized for downstream tasks.
  • Figure 3: In the partial process of the L2DP operator acting on the $j$-th center point $v_j$, the center point feature is replicated $k$ times and subtracted from neighboring features in the $k$-nearest neighbor neighborhood $\mathcal{G}_j$; a dot-product operation is performed to obtain directional information relative to the center point and its positional encoding; and the result is fed into FFN.
  • Figure 4: The point cloud $\mathcal{P}$ is projected via dot-products with spherical sampling unit vectors $\Omega$ of varying frequency amplitudes, yielding the spherical frequency-domain response $F(\mathcal{P},\{\Omega\})$ of global features, subsequently constructing the directional response spectrum $E(\mathcal{P},\{\Omega\})$ which characterizes the dominant directions of the point cloud's macroscopic structure.
  • Figure 5: Visualization of segmentation results.
  • ...and 1 more figures