Table of Contents
Fetching ...

Global Attention-Guided Dual-Domain Point Cloud Feature Learning for Classification and Segmentation

Zihao Li, Pan Gao, Kang You, Chuan Yan, Manoranjan Paul

TL;DR

This work tackles the inefficiency of input embeddings and local aggregation in point-cloud networks. It introduces the Contextual Position-enhanced Transformer (CPT) to produce a global-aware input embedding and the Double K-nearest neighbor Feature Fusion (DKFF) module to learn from both spatial and feature domains, integrated in a shared stem with task-specific branches for classification and segmentation. The approach achieves state-of-the-art results on ModelNet40, ScanObjectNN, ShapeNetPart, and S3DIS, demonstrating strong generalization across object-level and scene-level tasks. The proposed CPT and DKFF modules offer plug-and-play blocks that can enhance other 3D vision tasks such as completion, denoising, and compression, with end-to-end training and robust performance.

Abstract

Previous studies have demonstrated the effectiveness of point-based neural models on the point cloud analysis task. However, there remains a crucial issue on producing the efficient input embedding for raw point coordinates. Moreover, another issue lies in the limited efficiency of neighboring aggregations, which is a critical component in the network stem. In this paper, we propose a Global Attention-guided Dual-domain Feature Learning network (GAD) to address the above-mentioned issues. We first devise the Contextual Position-enhanced Transformer (CPT) module, which is armed with an improved global attention mechanism, to produce a global-aware input embedding that serves as the guidance to subsequent aggregations. Then, the Dual-domain K-nearest neighbor Feature Fusion (DKFF) is cascaded to conduct effective feature aggregation through novel dual-domain feature learning which appreciates both local geometric relations and long-distance semantic connections. Extensive experiments on multiple point cloud analysis tasks (e.g., classification, part segmentation, and scene semantic segmentation) demonstrate the superior performance of the proposed method and the efficacy of the devised modules.

Global Attention-Guided Dual-Domain Point Cloud Feature Learning for Classification and Segmentation

TL;DR

This work tackles the inefficiency of input embeddings and local aggregation in point-cloud networks. It introduces the Contextual Position-enhanced Transformer (CPT) to produce a global-aware input embedding and the Double K-nearest neighbor Feature Fusion (DKFF) module to learn from both spatial and feature domains, integrated in a shared stem with task-specific branches for classification and segmentation. The approach achieves state-of-the-art results on ModelNet40, ScanObjectNN, ShapeNetPart, and S3DIS, demonstrating strong generalization across object-level and scene-level tasks. The proposed CPT and DKFF modules offer plug-and-play blocks that can enhance other 3D vision tasks such as completion, denoising, and compression, with end-to-end training and robust performance.

Abstract

Previous studies have demonstrated the effectiveness of point-based neural models on the point cloud analysis task. However, there remains a crucial issue on producing the efficient input embedding for raw point coordinates. Moreover, another issue lies in the limited efficiency of neighboring aggregations, which is a critical component in the network stem. In this paper, we propose a Global Attention-guided Dual-domain Feature Learning network (GAD) to address the above-mentioned issues. We first devise the Contextual Position-enhanced Transformer (CPT) module, which is armed with an improved global attention mechanism, to produce a global-aware input embedding that serves as the guidance to subsequent aggregations. Then, the Dual-domain K-nearest neighbor Feature Fusion (DKFF) is cascaded to conduct effective feature aggregation through novel dual-domain feature learning which appreciates both local geometric relations and long-distance semantic connections. Extensive experiments on multiple point cloud analysis tasks (e.g., classification, part segmentation, and scene semantic segmentation) demonstrate the superior performance of the proposed method and the efficacy of the devised modules.
Paper Structure (25 sections, 13 equations, 5 figures, 8 tables)

This paper contains 25 sections, 13 equations, 5 figures, 8 tables.

Figures (5)

  • Figure 1: Network architecture of proposed method. "MLP" refers to the Multi-Layer Perceptron; "CPT" represents the devised Contextual Position-enhanced Transformer module; "DKFF" means the Double K-nearest neighbor Feature Fusion module; $N$ refers to the number of points of the input point cloud.
  • Figure 2: Proposed Contextual Position-enhanced Transformer (CPT) module. $X$ refers to the original point cloud coordinates; $F$ refers to the point cloud features; $N$ represents the number of points of the input point cloud; $C$ denotes the dimension of feature channel; MLP means multilayer perceptron.
  • Figure 3: Proposed Double K-nearest neighbor Feature Fusion (DKFF) module. $N$ represents the number of points of the input point cloud; $C$ denotes the dimension of feature channel; MLP means multilayer perceptron.
  • Figure 4: Visual comparison with other methods for part segmentation.
  • Figure 5: Visualization of semantic segmentation results on the S3DIS dataset.