Table of Contents
Fetching ...

Dynamic Graph CNN for Learning on Point Clouds

Yue Wang, Yongbin Sun, Ziwei Liu, Sanjay E. Sarma, Michael M. Bronstein, Justin M. Solomon

TL;DR

This work introduces EdgeConv, a neural operator for point clouds that constructs and learns on local graphs whose structure is dynamically updated at each layer. By computing edge features between nearby points and aggregating them with permutation-invariant operations, the model captures local geometry while preserving global shape information, and it updates neighbor relations in feature space to propagate information broadly. The approach yields state-of-the-art results on ModelNet40 for classification, ShapeNet Part for part segmentation, and competitive performance on S3DIS for indoor semantic segmentation, illustrating the benefits of dynamic graphs and edge-centric learning. The authors also demonstrate robustness to partial data and provide extensive ablations, showing that centralization, dynamic graph recomputation, and using more points consistently improve performance, with potential for broader applications and further efficiency improvements.

Abstract

Point clouds provide a flexible geometric representation suitable for countless applications in computer graphics; they also comprise the raw output of most 3D data acquisition devices. While hand-designed features on point clouds have long been proposed in graphics and vision, however, the recent overwhelming success of convolutional neural networks (CNNs) for image analysis suggests the value of adapting insight from CNN to the point cloud world. Point clouds inherently lack topological information so designing a model to recover topology can enrich the representation power of point clouds. To this end, we propose a new neural network module dubbed EdgeConv suitable for CNN-based high-level tasks on point clouds including classification and segmentation. EdgeConv acts on graphs dynamically computed in each layer of the network. It is differentiable and can be plugged into existing architectures. Compared to existing modules operating in extrinsic space or treating each point independently, EdgeConv has several appealing properties: It incorporates local neighborhood information; it can be stacked applied to learn global shape properties; and in multi-layer systems affinity in feature space captures semantic characteristics over potentially long distances in the original embedding. We show the performance of our model on standard benchmarks including ModelNet40, ShapeNetPart, and S3DIS.

Dynamic Graph CNN for Learning on Point Clouds

TL;DR

This work introduces EdgeConv, a neural operator for point clouds that constructs and learns on local graphs whose structure is dynamically updated at each layer. By computing edge features between nearby points and aggregating them with permutation-invariant operations, the model captures local geometry while preserving global shape information, and it updates neighbor relations in feature space to propagate information broadly. The approach yields state-of-the-art results on ModelNet40 for classification, ShapeNet Part for part segmentation, and competitive performance on S3DIS for indoor semantic segmentation, illustrating the benefits of dynamic graphs and edge-centric learning. The authors also demonstrate robustness to partial data and provide extensive ablations, showing that centralization, dynamic graph recomputation, and using more points consistently improve performance, with potential for broader applications and further efficiency improvements.

Abstract

Point clouds provide a flexible geometric representation suitable for countless applications in computer graphics; they also comprise the raw output of most 3D data acquisition devices. While hand-designed features on point clouds have long been proposed in graphics and vision, however, the recent overwhelming success of convolutional neural networks (CNNs) for image analysis suggests the value of adapting insight from CNN to the point cloud world. Point clouds inherently lack topological information so designing a model to recover topology can enrich the representation power of point clouds. To this end, we propose a new neural network module dubbed EdgeConv suitable for CNN-based high-level tasks on point clouds including classification and segmentation. EdgeConv acts on graphs dynamically computed in each layer of the network. It is differentiable and can be plugged into existing architectures. Compared to existing modules operating in extrinsic space or treating each point independently, EdgeConv has several appealing properties: It incorporates local neighborhood information; it can be stacked applied to learn global shape properties; and in multi-layer systems affinity in feature space captures semantic characteristics over potentially long distances in the original embedding. We show the performance of our model on standard benchmarks including ModelNet40, ShapeNetPart, and S3DIS.

Paper Structure

This paper contains 25 sections, 14 equations, 9 figures, 7 tables.

Figures (9)

  • Figure 1: Left: Computing an edge feature, $\mathbf{e}_{ij}$ (top), from a point pair, $\mathbf{x}_i$ and $\mathbf{x}_j$ (bottom). In this example, $h_{\boldsymbol{\Theta}}()$ is instantiated using a fully connected layer, and the learnable parameters are its associated weights. Right: The EdgeConv operation. The output of EdgeConv is calculated by aggregating the edge features associated with all the edges emanating from each connected vertex.
  • Figure 2: Model architectures: The model architectures used for classification (top branch) and segmentation (bottom branch). The classification model takes as input $n$ points, calculates an edge feature set of size $k$ for each point at an EdgeConv layer, and aggregates features within each set to compute EdgeConv responses for corresponding points. The output features of the last EdgeConv layer are aggregated globally to form an $1D$ global descriptor, which is used to generate classification scores for $c$ classes. The segmentation model extends the classification model by concatenating the $1D$ global descriptor and all the EdgeConv outputs (serving as local descriptors) for each point. It outputs per-point classification scores for $p$ semantic labels. $\oplus$: concatenation. Point cloud transform block: The point cloud transform block is designed to align an input point set to a canonical space by applying an estimated $3\times3$ matrix. To estimate the $3\times3$ matrix, a tensor concatenating the coordinates of each point and the coordinate differences between its $k$ neighboring points is used. EdgeConv block: The EdgeConv block takes as input a tensor of shape $n\times f$, computes edge features for each point by applying a multi-layer perceptron (mlp) with the number of layer neurons defined as $\{a_1, a_2, ..., a_n\}$, and generates a tensor of shape $n\times a_n$ after pooling among neighboring edge features.
  • Figure 3: Structure of the feature spaces produced at different stages of our shape classification neural network architecture, visualized as the distance between the red point to the rest of the points. For each set, Left: Euclidean distance in the input $\mathbb{R}^3$ space; Middle: Distance after the point cloud transform stage, amounting to a global transformation of the shape; Right: Distance in the feature space of the last layer. Observe how in the feature space of deeper layers semantically similar structures such as shelves of a bookshelf or legs of a table are brought close together, although they are distant in the original space.
  • Figure 4: Left: Results of our model tested with random input dropout. The model is trained with number of points being 1024 and $k$ being 20. Right: Point clouds with different number of points. The numbers of points are shown below the bottom row.
  • Figure 5: Our part segmentation testing results for tables, chairs and lamps.
  • ...and 4 more figures