RESSCAL3D: Resolution Scalable 3D Semantic Segmentation of Point Clouds
Remco Royen, Adrian Munteanu
TL;DR
RESSCAL3D addresses the need for resolution-scalable 3D semantic segmentation of point clouds by enabling initial fast predictions on low-resolution data and progressively refining as more points are captured. It introduces a multi-scale architecture that partitions the input into s resolutions, uses a PointTransformer backbone per scale, and fuses multi-resolution features through a KNN-based fusion module with Conv1D and MaxPool. Training is performed scale-by-scale with previous scales frozen, and the approach reduces attention complexity from $O(N^{2})$ to $O( abla \,\sum_{i=1}^{s} N_i^{2})$ while maintaining accuracy close to a non-scalable baseline. On S3DIS, RESSCAL3D achieves 31–62% faster inference with only a minor mIoU drop (within about 2.1% at the highest scale) and provides early intermediate predictions around 6% of the total inference time, showcasing practical benefits for streaming 3D data acquisition and real-time scene understanding.
Abstract
While deep learning-based methods have demonstrated outstanding results in numerous domains, some important functionalities are missing. Resolution scalability is one of them. In this work, we introduce a novel architecture, dubbed RESSCAL3D, providing resolution-scalable 3D semantic segmentation of point clouds. In contrast to existing works, the proposed method does not require the whole point cloud to be available to start inference. Once a low-resolution version of the input point cloud is available, first semantic predictions can be generated in an extremely fast manner. This enables early decision-making in subsequent processing steps. As additional points become available, these are processed in parallel. To improve performance, features from previously computed scales are employed as prior knowledge at the current scale. Our experiments show that RESSCAL3D is 31-62% faster than the non-scalable baseline while keeping a limited impact on performance. To the best of our knowledge, the proposed method is the first to propose a resolution-scalable approach for 3D semantic segmentation of point clouds based on deep learning.
