Table of Contents
Fetching ...

KPConvX: Modernizing Kernel Point Convolution with Kernel Attention

Hugues Thomas, Yao-Hung Hubert Tsai, Timothy D. Barfoot, Jian Zhang

TL;DR

This work revisits kernel-point convolution by introducing KPConvD, a depthwise variant, and KPConvX, which adds geometric kernel attention. By combining these operators with modern architectural designs and training strategies, the method achieves state-of-the-art results on S3DIS, ScanNetv2, and ScanObjectNN, while maintaining efficiency through nearest-kernel operations and shell-based kernel point layouts. The ablations demonstrate substantial gains from depthwise design, attention modulation, and carefully chosen kernel shells and groupings. Overall, KPConvX offers a robust, scalable framework for 3D point-cloud understanding, bridging geometric kernels with attention-like modulation for improved accuracy and efficiency.

Abstract

In the field of deep point cloud understanding, KPConv is a unique architecture that uses kernel points to locate convolutional weights in space, instead of relying on Multi-Layer Perceptron (MLP) encodings. While it initially achieved success, it has since been surpassed by recent MLP networks that employ updated designs and training strategies. Building upon the kernel point principle, we present two novel designs: KPConvD (depthwise KPConv), a lighter design that enables the use of deeper architectures, and KPConvX, an innovative design that scales the depthwise convolutional weights of KPConvD with kernel attention values. Using KPConvX with a modern architecture and training strategy, we are able to outperform current state-of-the-art approaches on the ScanObjectNN, Scannetv2, and S3DIS datasets. We validate our design choices through ablation studies and release our code and models.

KPConvX: Modernizing Kernel Point Convolution with Kernel Attention

TL;DR

This work revisits kernel-point convolution by introducing KPConvD, a depthwise variant, and KPConvX, which adds geometric kernel attention. By combining these operators with modern architectural designs and training strategies, the method achieves state-of-the-art results on S3DIS, ScanNetv2, and ScanObjectNN, while maintaining efficiency through nearest-kernel operations and shell-based kernel point layouts. The ablations demonstrate substantial gains from depthwise design, attention modulation, and carefully chosen kernel shells and groupings. Overall, KPConvX offers a robust, scalable framework for 3D point-cloud understanding, bridging geometric kernels with attention-like modulation for improved accuracy and efficiency.

Abstract

In the field of deep point cloud understanding, KPConv is a unique architecture that uses kernel points to locate convolutional weights in space, instead of relying on Multi-Layer Perceptron (MLP) encodings. While it initially achieved success, it has since been surpassed by recent MLP networks that employ updated designs and training strategies. Building upon the kernel point principle, we present two novel designs: KPConvD (depthwise KPConv), a lighter design that enables the use of deeper architectures, and KPConvX, an innovative design that scales the depthwise convolutional weights of KPConvD with kernel attention values. Using KPConvX with a modern architecture and training strategy, we are able to outperform current state-of-the-art approaches on the ScanObjectNN, Scannetv2, and S3DIS datasets. We validate our design choices through ablation studies and release our code and models.
Paper Structure (19 sections, 11 equations, 9 figures, 12 tables)

This paper contains 19 sections, 11 equations, 9 figures, 12 tables.

Figures (9)

  • Figure 1: KPConvD and KPConvX using small (S) and large (L) architectures outperform other state-of-the-art architectures on ScanNetv2 dataset using a relatively small number of parameters.
  • Figure 2: Illustration of our new designs compared to the original KPConv operator. KPConvD adopts a lighter depthwise design and KPConvX includes kernel attention.
  • Figure 3: Illustration of our network architecture KPConvX-L. It can be used for semantic segmentation or shape classification. It has a total of $44$ encoder blocks, plus one stem KPConv and 4 decoder blocks. We use inverted bottleneck blocks and grid subsampling from one layer to the next.
  • Figure 4: Illustration of kernel dispositions in 2D with one or two shells. Red circles are kernel points. The shells highlighted in green are placed regularly along the radius (left). The nearest-kernel area of each kernel point is shown in yellow (right).
  • Figure 5: Illustration of our kernel attention principle, where chunks of space are weighted (right) compared to standard point self-attention where the neighbors are weighted depending on their feature instead of their position (left). In our design, no position encoding is needed, as the attention itself is a position encoding.
  • ...and 4 more figures