Table of Contents
Fetching ...

Efficient Point Cloud Processing with High-Dimensional Positional Encoding and Non-Local MLPs

Yanmei Zou, Hongshan Yu, Yaonan Wang, Zhengeng Yang, Xieyuanli Chen, Kailun Yang, Naveed Akhtar

TL;DR

A two-stage abstraction and refinement (ABS-REF) view for modular feature extraction in point cloud processing is developed and a High-dimensional Positional Encoding (HPE) module is proposed to explicitly utilize intrinsic positional information, extending the "positional encoding" concept from Transformer literature.

Abstract

Multi-Layer Perceptron (MLP) models are the foundation of contemporary point cloud processing. However, their complex network architectures obscure the source of their strength and limit the application of these models. In this article, we develop a two-stage abstraction and refinement (ABS-REF) view for modular feature extraction in point cloud processing. This view elucidates that whereas the early models focused on ABS stages, the more recent techniques devise sophisticated REF stages to attain performance advantages. Then, we propose a High-dimensional Positional Encoding (HPE) module to explicitly utilize intrinsic positional information, extending the ``positional encoding'' concept from Transformer literature. HPE can be readily deployed in MLP-based architectures and is compatible with transformer-based methods. Within our ABS-REF view, we rethink local aggregation in MLP-based methods and propose replacing time-consuming local MLP operations, which are used to capture local relationships among neighbors. Instead, we use non-local MLPs for efficient non-local information updates, combined with the proposed HPE for effective local information representation. We leverage our modules to develop HPENets, a suite of MLP networks that follow the ABS-REF paradigm, incorporating a scalable HPE-based REF stage. Extensive experiments on seven public datasets across four different tasks show that HPENets deliver a strong balance between efficiency and effectiveness. Notably, HPENet surpasses PointNeXt, a strong MLP-based counterpart, by 1.1% mAcc, 4.0% mIoU, 1.8% mIoU, and 0.2% Cls. mIoU, with only 50.0%, 21.5%, 23.1%, 44.4% of FLOPs on ScanObjectNN, S3DIS, ScanNet, and ShapeNetPart, respectively. Source code is available at https://github.com/zouyanmei/HPENet_v2.git.

Efficient Point Cloud Processing with High-Dimensional Positional Encoding and Non-Local MLPs

TL;DR

A two-stage abstraction and refinement (ABS-REF) view for modular feature extraction in point cloud processing is developed and a High-dimensional Positional Encoding (HPE) module is proposed to explicitly utilize intrinsic positional information, extending the "positional encoding" concept from Transformer literature.

Abstract

Multi-Layer Perceptron (MLP) models are the foundation of contemporary point cloud processing. However, their complex network architectures obscure the source of their strength and limit the application of these models. In this article, we develop a two-stage abstraction and refinement (ABS-REF) view for modular feature extraction in point cloud processing. This view elucidates that whereas the early models focused on ABS stages, the more recent techniques devise sophisticated REF stages to attain performance advantages. Then, we propose a High-dimensional Positional Encoding (HPE) module to explicitly utilize intrinsic positional information, extending the ``positional encoding'' concept from Transformer literature. HPE can be readily deployed in MLP-based architectures and is compatible with transformer-based methods. Within our ABS-REF view, we rethink local aggregation in MLP-based methods and propose replacing time-consuming local MLP operations, which are used to capture local relationships among neighbors. Instead, we use non-local MLPs for efficient non-local information updates, combined with the proposed HPE for effective local information representation. We leverage our modules to develop HPENets, a suite of MLP networks that follow the ABS-REF paradigm, incorporating a scalable HPE-based REF stage. Extensive experiments on seven public datasets across four different tasks show that HPENets deliver a strong balance between efficiency and effectiveness. Notably, HPENet surpasses PointNeXt, a strong MLP-based counterpart, by 1.1% mAcc, 4.0% mIoU, 1.8% mIoU, and 0.2% Cls. mIoU, with only 50.0%, 21.5%, 23.1%, 44.4% of FLOPs on ScanObjectNN, S3DIS, ScanNet, and ShapeNetPart, respectively. Source code is available at https://github.com/zouyanmei/HPENet_v2.git.
Paper Structure (33 sections, 13 equations, 16 figures, 15 tables)

This paper contains 33 sections, 13 equations, 16 figures, 15 tables.

Figures (16)

  • Figure 1: Segmentation performance of HPENet V2 on S3DIS 43armeni20163d. Bubble diameter is proportional to model parameter and the legend lists reference sizes of 1M, 3M, 20M, and 40M. HPENet V2 achieves higher throughput and competitive or better mIoU than state-of-the-art methods, including PointMetaBase 98lin2023meta, PointNeXt 5qian2022pointnext, and PointVector 104deng2023pointvector, while using fewer parameters. Compared with our previous HPENet 105zou2024improved, HPENet V2 delivers comparable performance while using fewer parameters and achieving about $2.2\times$ faster inference.
  • Figure 2: HPENet V2 architecture for semantic segmentation. The network delineates between the Abstraction (ABS) and Refinement (REF) stages of feature extraction and employs the proposed High-dimensional Positional Encoding (HPE) module in both stages. The differences between HPENet V2 and HPENet 105zou2024improved are highlighted with red borders, such as the Backward Fusion Module (BFM) and the non-local MLPs.
  • Figure 3: Illustration of abstraction and refinement (ABS-REF) perspective. Left: The proposed ABS-REF view of point cloud models is analogous to subsampling and convolution block view in image models. The shown ABS-REF column expands the abstraction and refinement stages. Right: Representative instantiations of the ABS-REF framework. Whereas early methods, e.g., PointNet++ 8qi2017pointnet++, PointConv 20wu2019pointconv, ignore the REF stage, more recent techniques, e.g., Point Transformer 3zhao2021point and PointMixer 4choe2022pointmixer, achieve higher performance by accounting for the REF stage in point cloud models. Abbreviations include SOP: Symmetric OPeration, OP: aggregation OPeration, PT: Point Transformer, and HPE: proposed High-dimensional Positional Encoding.
  • Figure 4: Illustration of separable Abstraction and Refinement (ABS-REF) perspective. Left: The separable ABS-REF view also belongs to the proposed ABS-REF by disentangling the intra-set operation and inter-set operation into non-local MLPs and the reduction operation. Right: Representative instantiations of the separable ABS-REF framework and the evolution of HPENets, i.e., from HPENet to HPENet V2. Please refer to Fig. \ref{['fig:ABS-REF']} for the feature propagation in the proposed ABS-REF view.
  • Figure 5: Illustration of local aggregation in the ABS stages of MLP-based methods. (a) PreConv processes input point sets with non-local MLPs before sampling operations. (b) Traditional Conv processes neighboring point sets generated by grouping operations with local MLPs. (c) ProConv processes sampled point sets with non-local aggregation after reduction operations. Please refer to the text for further notations.
  • ...and 11 more figures