EZ-SP: Fast and Lightweight Superpoint-Based 3D Segmentation
Louis Geist, Loic Landrieu, Damien Robert
TL;DR
EZ-SP tackles the CPU-bound bottleneck of partitioning in superpoint-based 3D semantic segmentation by introducing a fully GPU-based pipeline. It learns embeddings that detect semantic transitions, then forms coherent multi-level superpoints with a massively parallel greedy partition algorithm, and finally uses a lightweight superpoint classifier for dense labeling. The approach achieves 13x faster partitioning and 72x faster end-to-end inference while maintaining competitive accuracy across indoor, mobile, and aerial LiDAR benchmarks, with a minimal memory footprint (<2 MB VRAM). The work demonstrates strong generalization and practical viability for real-time perception on resource-constrained platforms, and provides open-source code and pretrained models.
Abstract
Superpoint-based pipelines provide an efficient alternative to point- or voxel-based 3D semantic segmentation, but are often bottlenecked by their CPU-bound partition step. We propose a learnable, fully GPU partitioning algorithm that generates geometrically and semantically coherent superpoints 13$\times$ faster than prior methods. Our module is compact (under 60k parameters), trains in under 20 minutes with a differentiable surrogate loss, and requires no handcrafted features. Combine with a lightweight superpoint classifier, the full pipeline fits in $<$2 MB of VRAM, scales to multi-million-point scenes, and supports real-time inference. With 72$\times$ faster inference and 120$\times$ fewer parameters, EZ-SP matches the accuracy of point-based SOTA models across three domains: indoor scans (S3DIS), autonomous driving (KITTI-360), and aerial LiDAR (DALES). Code and pretrained models are accessible at github.com/drprojects/superpoint_transformer.
