Table of Contents
Fetching ...

DHGCN: Dynamic Hop Graph Convolution Network for Self-Supervised Point Cloud Learning

Jincen Jiang, Lizhi Zhao, Xuequan Lu, Wei Hu, Imran Razzak, Meili Wang

TL;DR

DHGCN tackles self-supervised learning for 3D point clouds by explicitly modeling contextual relationships between voxelized point parts through a self-supervised hop distance reconstruction task. It constructs a complete graph of parts, applies PartConvolution for part features, and uses a Hop Graph Attention mechanism that leverages learned hop distances (via a Gaussian kernel) to weight edge features during aggregation, with distances updated dynamically across layers. The approach yields a plug-and-play module compatible with common point-based backbones and demonstrates state-of-the-art unsupervised performance on downstream classification and shape part segmentation tasks, including robustness on real-world data. These results suggest that explicitly encoding part-level adjacency and distance information enhances representation learning for non-Euclidean 3D data, with practical impact for scalable 3D understanding without labels.

Abstract

Recent works attempt to extend Graph Convolution Networks (GCNs) to point clouds for classification and segmentation tasks. These works tend to sample and group points to create smaller point sets locally and mainly focus on extracting local features through GCNs, while ignoring the relationship between point sets. In this paper, we propose the Dynamic Hop Graph Convolution Network (DHGCN) for explicitly learning the contextual relationships between the voxelized point parts, which are treated as graph nodes. Motivated by the intuition that the contextual information between point parts lies in the pairwise adjacent relationship, which can be depicted by the hop distance of the graph quantitatively, we devise a novel self-supervised part-level hop distance reconstruction task and design a novel loss function accordingly to facilitate training. In addition, we propose the Hop Graph Attention (HGA), which takes the learned hop distance as input for producing attention weights to allow edge features to contribute distinctively in aggregation. Eventually, the proposed DHGCN is a plug-and-play module that is compatible with point-based backbone networks. Comprehensive experiments on different backbones and tasks demonstrate that our self-supervised method achieves state-of-the-art performance. Our source code is available at: https://github.com/Jinec98/DHGCN.

DHGCN: Dynamic Hop Graph Convolution Network for Self-Supervised Point Cloud Learning

TL;DR

DHGCN tackles self-supervised learning for 3D point clouds by explicitly modeling contextual relationships between voxelized point parts through a self-supervised hop distance reconstruction task. It constructs a complete graph of parts, applies PartConvolution for part features, and uses a Hop Graph Attention mechanism that leverages learned hop distances (via a Gaussian kernel) to weight edge features during aggregation, with distances updated dynamically across layers. The approach yields a plug-and-play module compatible with common point-based backbones and demonstrates state-of-the-art unsupervised performance on downstream classification and shape part segmentation tasks, including robustness on real-world data. These results suggest that explicitly encoding part-level adjacency and distance information enhances representation learning for non-Euclidean 3D data, with practical impact for scalable 3D understanding without labels.

Abstract

Recent works attempt to extend Graph Convolution Networks (GCNs) to point clouds for classification and segmentation tasks. These works tend to sample and group points to create smaller point sets locally and mainly focus on extracting local features through GCNs, while ignoring the relationship between point sets. In this paper, we propose the Dynamic Hop Graph Convolution Network (DHGCN) for explicitly learning the contextual relationships between the voxelized point parts, which are treated as graph nodes. Motivated by the intuition that the contextual information between point parts lies in the pairwise adjacent relationship, which can be depicted by the hop distance of the graph quantitatively, we devise a novel self-supervised part-level hop distance reconstruction task and design a novel loss function accordingly to facilitate training. In addition, we propose the Hop Graph Attention (HGA), which takes the learned hop distance as input for producing attention weights to allow edge features to contribute distinctively in aggregation. Eventually, the proposed DHGCN is a plug-and-play module that is compatible with point-based backbone networks. Comprehensive experiments on different backbones and tasks demonstrate that our self-supervised method achieves state-of-the-art performance. Our source code is available at: https://github.com/Jinec98/DHGCN.
Paper Structure (20 sections, 11 equations, 4 figures, 7 tables)

This paper contains 20 sections, 11 equations, 4 figures, 7 tables.

Figures (4)

  • Figure 1: First row: Constructing the ground truth graph for our self-supervised hop distance reconstruction task. (a): Voxelizing the point cloud into parts, taking each part as a graph node. (b): The topology of the ground truth graph. Two nodes are adjacent if their scaled bounding boxes are intersected. (c): The shortest path between a node (enlarged red point) and other nodes. The number on each node denotes the hop distance which motivates our self-supervised task. Second row: Sampling and grouping based strategy dgcnnwang2019graphattenconv. (d): Sampling center points and grouping local point sets. (e): Constructing a local graph for each point set. We explore the contextual relationships between parts, while previous strategies focus on extracting local features of point sets.
  • Figure 2: DHGCN architecture: We feed the input point cloud to PointFeatureConv for extracting point-wise representations, which are then taken as input by Hop Graph Convolution (HopGraphConv), to extract more accurate local geometric representations. Hop Graph Convolution: The HopGraphConv layer takes the point features as input, and achieves part features through part-level pooling. We construct a complete graph by taking parts as nodes and connecting each pair of them, and use PartConv and HGA to extract graph edge features. We propose the self-supervised hop distance reconstruction task to predict the distance matrix of the complete graph from edge features. $\lambda$ controls whether the HGA embeds hop distance. Finally, edge features are aggregated and repeated at the part-level, providing additional representations for the point-based backbone network.
  • Figure 3: Given the input point cloud, we first voxelize it into parts. For each part, we compute its scaled axis-aligned bounding box to calculate the adjacent relation between parts. We construct a ground truth graph along with its distance matrix for supervision in each layer.
  • Figure 4: Attention maps (row 1) for different query parts and feature distance from the query point (indicated by star, row 2 (ours) and row 3 (DGCNN)) to all other points, with yellow to red indicating increasing attention weight or closer distance.