Table of Contents
Fetching ...

PointRWKV: Efficient RWKV-Like Model for Hierarchical Point Cloud Learning

Qingdong He, Jiangning Zhang, Jinlong Peng, Haoyang He, Xiangtai Li, Yabiao Wang, Chengjie Wang

TL;DR

PointRWKV is presented, a model of linear complexity derived from the RWKV model in the NLP field with necessary modifications for point cloud learning tasks, and designed as a multi-scale framework for hierarchical feature learning of 3D point clouds, facilitating various downstream tasks.

Abstract

Transformers have revolutionized the point cloud learning task, but the quadratic complexity hinders its extension to long sequence and makes a burden on limited computational resources. The recent advent of RWKV, a fresh breed of deep sequence models, has shown immense potential for sequence modeling in NLP tasks. In this paper, we present PointRWKV, a model of linear complexity derived from the RWKV model in the NLP field with necessary modifications for point cloud learning tasks. Specifically, taking the embedded point patches as input, we first propose to explore the global processing capabilities within PointRWKV blocks using modified multi-headed matrix-valued states and a dynamic attention recurrence mechanism. To extract local geometric features simultaneously, we design a parallel branch to encode the point cloud efficiently in a fixed radius near-neighbors graph with a graph stabilizer. Furthermore, we design PointRWKV as a multi-scale framework for hierarchical feature learning of 3D point clouds, facilitating various downstream tasks. Extensive experiments on different point cloud learning tasks show our proposed PointRWKV outperforms the transformer- and mamba-based counterparts, while significantly saving about 42\% FLOPs, demonstrating the potential option for constructing foundational 3D models.

PointRWKV: Efficient RWKV-Like Model for Hierarchical Point Cloud Learning

TL;DR

PointRWKV is presented, a model of linear complexity derived from the RWKV model in the NLP field with necessary modifications for point cloud learning tasks, and designed as a multi-scale framework for hierarchical feature learning of 3D point clouds, facilitating various downstream tasks.

Abstract

Transformers have revolutionized the point cloud learning task, but the quadratic complexity hinders its extension to long sequence and makes a burden on limited computational resources. The recent advent of RWKV, a fresh breed of deep sequence models, has shown immense potential for sequence modeling in NLP tasks. In this paper, we present PointRWKV, a model of linear complexity derived from the RWKV model in the NLP field with necessary modifications for point cloud learning tasks. Specifically, taking the embedded point patches as input, we first propose to explore the global processing capabilities within PointRWKV blocks using modified multi-headed matrix-valued states and a dynamic attention recurrence mechanism. To extract local geometric features simultaneously, we design a parallel branch to encode the point cloud efficiently in a fixed radius near-neighbors graph with a graph stabilizer. Furthermore, we design PointRWKV as a multi-scale framework for hierarchical feature learning of 3D point clouds, facilitating various downstream tasks. Extensive experiments on different point cloud learning tasks show our proposed PointRWKV outperforms the transformer- and mamba-based counterparts, while significantly saving about 42\% FLOPs, demonstrating the potential option for constructing foundational 3D models.
Paper Structure (14 sections, 9 equations, 5 figures, 9 tables)

This paper contains 14 sections, 9 equations, 5 figures, 9 tables.

Figures (5)

  • Figure 1: Architecture comparison with different methods: (a) MLP-based PointMLP ma2022rethinking, (b) transformer-based PointMAE pang2022masked, (c) mamba-based PointMamba liang2024pointmamba and (d) ours PointRWKV with linear complexity is capable of integrating the advantages of both global and local modeling, and multi-scale features endow it with more refined prediction accuracy.
  • Figure 2: Accuracy-speed tradeoff. (Left) Overall accuracy acquired by different methods with relative parameters, (Right) FLOPs increase with sequence length.
  • Figure 3: Overview of the proposed PointRWKV, which employs a hierarchical architecture to encode multi-scale point cloud features. The whole framework is composed of a series of PRWKV blocks which include the integrative feature modulation branch and the local graph-based merging branch to form the parallel feature learning strategy.
  • Figure 4: Part segmentation results on ShapeNetPart. Top row is ground truth and bottom row is our prediction.
  • Figure A1: Qualitative results of part segmentation results on ShapeNetPart. Top row is ground truth and bottom row is our prediction.