Table of Contents
Fetching ...

EZ-SP: Fast and Lightweight Superpoint-Based 3D Segmentation

Louis Geist, Loic Landrieu, Damien Robert

TL;DR

EZ-SP tackles the CPU-bound bottleneck of partitioning in superpoint-based 3D semantic segmentation by introducing a fully GPU-based pipeline. It learns embeddings that detect semantic transitions, then forms coherent multi-level superpoints with a massively parallel greedy partition algorithm, and finally uses a lightweight superpoint classifier for dense labeling. The approach achieves 13x faster partitioning and 72x faster end-to-end inference while maintaining competitive accuracy across indoor, mobile, and aerial LiDAR benchmarks, with a minimal memory footprint (<2 MB VRAM). The work demonstrates strong generalization and practical viability for real-time perception on resource-constrained platforms, and provides open-source code and pretrained models.

Abstract

Superpoint-based pipelines provide an efficient alternative to point- or voxel-based 3D semantic segmentation, but are often bottlenecked by their CPU-bound partition step. We propose a learnable, fully GPU partitioning algorithm that generates geometrically and semantically coherent superpoints 13$\times$ faster than prior methods. Our module is compact (under 60k parameters), trains in under 20 minutes with a differentiable surrogate loss, and requires no handcrafted features. Combine with a lightweight superpoint classifier, the full pipeline fits in $<$2 MB of VRAM, scales to multi-million-point scenes, and supports real-time inference. With 72$\times$ faster inference and 120$\times$ fewer parameters, EZ-SP matches the accuracy of point-based SOTA models across three domains: indoor scans (S3DIS), autonomous driving (KITTI-360), and aerial LiDAR (DALES). Code and pretrained models are accessible at github.com/drprojects/superpoint_transformer.

EZ-SP: Fast and Lightweight Superpoint-Based 3D Segmentation

TL;DR

EZ-SP tackles the CPU-bound bottleneck of partitioning in superpoint-based 3D semantic segmentation by introducing a fully GPU-based pipeline. It learns embeddings that detect semantic transitions, then forms coherent multi-level superpoints with a massively parallel greedy partition algorithm, and finally uses a lightweight superpoint classifier for dense labeling. The approach achieves 13x faster partitioning and 72x faster end-to-end inference while maintaining competitive accuracy across indoor, mobile, and aerial LiDAR benchmarks, with a minimal memory footprint (<2 MB VRAM). The work demonstrates strong generalization and practical viability for real-time perception on resource-constrained platforms, and provides open-source code and pretrained models.

Abstract

Superpoint-based pipelines provide an efficient alternative to point- or voxel-based 3D semantic segmentation, but are often bottlenecked by their CPU-bound partition step. We propose a learnable, fully GPU partitioning algorithm that generates geometrically and semantically coherent superpoints 13 faster than prior methods. Our module is compact (under 60k parameters), trains in under 20 minutes with a differentiable surrogate loss, and requires no handcrafted features. Combine with a lightweight superpoint classifier, the full pipeline fits in 2 MB of VRAM, scales to multi-million-point scenes, and supports real-time inference. With 72 faster inference and 120 fewer parameters, EZ-SP matches the accuracy of point-based SOTA models across three domains: indoor scans (S3DIS), autonomous driving (KITTI-360), and aerial LiDAR (DALES). Code and pretrained models are accessible at github.com/drprojects/superpoint_transformer.

Paper Structure

This paper contains 16 sections, 2 theorems, 12 equations, 7 figures, 3 tables, 2 algorithms.

Key Result

Proposition 1

Merging adjacent superpoints $(P,Q) \in \bm{\mathcal{E}}$ decreases $\Omega(\mathcal{P})$ by the following edge merge gain:

Figures (7)

  • Figure 1: Inference Speed v.s. Performance v.s. Model Size. Comparison of end-to-end pipelines (preprocessing to inference) on S3DIS. EZ-SP achieves near-SOTA accuracy with only $400$k parameters, while being orders of magnitude faster than point-based networks, and the only method to match the acquisition rate of automotive LiDAR ().
  • Figure 2: EZ-SP. A $60$k-parameter backbone embeds every point of the input scene into a low-dimensional space where adjacent points from different semantic classes (inter-edges) are pushed apart. A GPU-accelerated algorithm then clusters neighbouring points with similar embeddings, producing semantically homogeneous superpoints. Finally, a lightweight ($330$k-parameter) superpoint-level network assigns a label to each superpoint, which is broadcast back to its points for dense segmentation.
  • Figure 3: Parallel Combinatorial Partition. Our algorithm greedily approximates a graph signal with piecewise-constant components. Conflicting merges (nodes with multiple outgoing edges) are removed, enabling an efficient parallel implementation on GPUs.
  • Figure 4: Partition Examples. Visualization of point cloud partitions across three datasets and three partitioning algorithms. \ref{['fig:quali:input']} shows the full dataset sizes; we also report, for each configuration, the resulting number of superpoints and the partition purity over the validation dataset (all folds for S3DIS).
  • Figure 5: Oversegmentation Performance. Oracle mIoU as a function of the number of superpoints on S3DIS, KITTI-360, and DALES. We also report the throughput (from raw points to superpoints) on S3DIS, with error bars indicating variance across configurations. EZ-SP achieves partition purity comparable to or better than PCP while being over 13$\times$ faster.
  • ...and 2 more figures

Theorems & Definitions (3)

  • Proposition 1
  • Proposition 1
  • proof