Table of Contents
Fetching ...

Composite Convolution: a Flexible Operator for Deep Learning on 3D Point Clouds

Alberto Floris, Luca Frittoli, Diego Carrera, Giacomo Boracchi

TL;DR

The composite layer is introduced, a flexible and general alternative to the existing convolutional operators that process 3D point clouds, to extract and compress the spatial information from the 3D coordinates of points and then combine this with the feature vectors.

Abstract

Deep neural networks require specific layers to process point clouds, as the scattered and irregular location of 3D points prevents the use of conventional convolutional filters. We introduce the composite layer, a flexible and general alternative to the existing convolutional operators that process 3D point clouds. We design our composite layer to extract and compress the spatial information from the 3D coordinates of points and then combine this with the feature vectors. Compared to mainstream point-convolutional layers such as ConvPoint and KPConv, our composite layer guarantees greater flexibility in network design and provides an additional form of regularization. To demonstrate the generality of our composite layers, we define both a convolutional composite layer and an aggregate version that combines spatial information and features in a nonlinear manner, and we use these layers to implement CompositeNets. Our experiments on synthetic and real-world datasets show that, in both classification, segmentation, and anomaly detection, our CompositeNets outperform ConvPoint, which uses the same sequential architecture, and achieve similar results as KPConv, which has a deeper, residual architecture. Moreover, our CompositeNets achieve state-of-the-art performance in anomaly detection on point clouds. Our code is publicly available at \url{https://github.com/sirolf-otrebla/CompositeNet}.

Composite Convolution: a Flexible Operator for Deep Learning on 3D Point Clouds

TL;DR

The composite layer is introduced, a flexible and general alternative to the existing convolutional operators that process 3D point clouds, to extract and compress the spatial information from the 3D coordinates of points and then combine this with the feature vectors.

Abstract

Deep neural networks require specific layers to process point clouds, as the scattered and irregular location of 3D points prevents the use of conventional convolutional filters. We introduce the composite layer, a flexible and general alternative to the existing convolutional operators that process 3D point clouds. We design our composite layer to extract and compress the spatial information from the 3D coordinates of points and then combine this with the feature vectors. Compared to mainstream point-convolutional layers such as ConvPoint and KPConv, our composite layer guarantees greater flexibility in network design and provides an additional form of regularization. To demonstrate the generality of our composite layers, we define both a convolutional composite layer and an aggregate version that combines spatial information and features in a nonlinear manner, and we use these layers to implement CompositeNets. Our experiments on synthetic and real-world datasets show that, in both classification, segmentation, and anomaly detection, our CompositeNets outperform ConvPoint, which uses the same sequential architecture, and achieve similar results as KPConv, which has a deeper, residual architecture. Moreover, our CompositeNets achieve state-of-the-art performance in anomaly detection on point clouds. Our code is publicly available at \url{https://github.com/sirolf-otrebla/CompositeNet}.
Paper Structure (18 sections, 9 equations, 3 figures, 6 tables)

This paper contains 18 sections, 9 equations, 3 figures, 6 tables.

Figures (3)

  • Figure 1: The operations of our composite layer on the input point cloud $P$ (the blue dots $x$, each paired with a feature vector $\phi(x)$) to obtain the output point cloud $Q$ (the red dots $y$ sampled from $P$, each paired with its output feature vector $\psi(y)$). The spatial function $s$ outputs a vector in $\mathbb{R}^K$ for each point $x$ belonging to the convolution window $X_y$, where $y$ is the output point. The semantic function combines the input features $\phi$ and the output of the spatial function $\{s(x-y)\}_{x \in X_y}$ to produce the output features $\psi(y)$.
  • Figure 2: The operations in our convolutional composite layer (a) and the well-known point-convolutional layers ConvPoint and KPConv (b) expressed in matrix form. Here $\Phi$ and $\psi(y)$ indicate, respectively, the input features of the points of the convolution window $X_y$ and the output feature vector of $y$; the matrix $H$ stacks the 3D coordinates of the points of $X_y$, processed by the correlation function $h$\ref{['eq:H']}; $\otimes$ indicates the Frobenius inner product. In practice, our composite layer decomposes the weight tensor $\widetilde{W}$ of ConvPoint and KPConv into the product $W\cdot V$, enabling more flexibility when designing the network.
  • Figure 3: (a) Overall accuracy on ScanNet against the number of parameters, varying the number of centers $M$. We report the average processing time during training for the least and most complex versions. The parameters of ConvPoint and KPConv-vanilla increase linearly with $M$, while in our CompositeNets $M$ can be increased without significantly changing the number of parameters. (b) Overall accuracy on ModelNet40 when reducing the number of parameters by tuning $K$ ($M$ for KPConv). We also report the number of parameters of the most and least complex versions. These results show that we can substantially reduce the parameters of our CompositeNets without compromising the accuracy, while this is not the case in KPConv.