RESSCAL3D: Resolution Scalable 3D Semantic Segmentation of Point Clouds

Remco Royen; Adrian Munteanu

RESSCAL3D: Resolution Scalable 3D Semantic Segmentation of Point Clouds

Remco Royen, Adrian Munteanu

TL;DR

RESSCAL3D addresses the need for resolution-scalable 3D semantic segmentation of point clouds by enabling initial fast predictions on low-resolution data and progressively refining as more points are captured. It introduces a multi-scale architecture that partitions the input into s resolutions, uses a PointTransformer backbone per scale, and fuses multi-resolution features through a KNN-based fusion module with Conv1D and MaxPool. Training is performed scale-by-scale with previous scales frozen, and the approach reduces attention complexity from $O(N^{2})$ to $O( abla \,\sum_{i=1}^{s} N_i^{2})$ while maintaining accuracy close to a non-scalable baseline. On S3DIS, RESSCAL3D achieves 31–62% faster inference with only a minor mIoU drop (within about 2.1% at the highest scale) and provides early intermediate predictions around 6% of the total inference time, showcasing practical benefits for streaming 3D data acquisition and real-time scene understanding.

Abstract

While deep learning-based methods have demonstrated outstanding results in numerous domains, some important functionalities are missing. Resolution scalability is one of them. In this work, we introduce a novel architecture, dubbed RESSCAL3D, providing resolution-scalable 3D semantic segmentation of point clouds. In contrast to existing works, the proposed method does not require the whole point cloud to be available to start inference. Once a low-resolution version of the input point cloud is available, first semantic predictions can be generated in an extremely fast manner. This enables early decision-making in subsequent processing steps. As additional points become available, these are processed in parallel. To improve performance, features from previously computed scales are employed as prior knowledge at the current scale. Our experiments show that RESSCAL3D is 31-62% faster than the non-scalable baseline while keeping a limited impact on performance. To the best of our knowledge, the proposed method is the first to propose a resolution-scalable approach for 3D semantic segmentation of point clouds based on deep learning.

RESSCAL3D: Resolution Scalable 3D Semantic Segmentation of Point Clouds

TL;DR

while maintaining accuracy close to a non-scalable baseline. On S3DIS, RESSCAL3D achieves 31–62% faster inference with only a minor mIoU drop (within about 2.1% at the highest scale) and provides early intermediate predictions around 6% of the total inference time, showcasing practical benefits for streaming 3D data acquisition and real-time scene understanding.

Abstract

Paper Structure (5 sections, 1 equation, 5 figures, 1 table)

This paper contains 5 sections, 1 equation, 5 figures, 1 table.

Introduction
Proposed method
Experiments
Ablation Study
Conclusion

Figures (5)

Figure 1: The RESSCAL3D architecture. The grey circle with 'C' stands for concatenation.
Figure 2: RESSCAL3D fusion module
Figure 3: Ablation study and comparison of the scalable RESSCAL3D with the non-scalable baseline
Figure 4: Comparison of RESSCAL3D with the non-scalable baseline in inference time. The actual inference latency is bounded to the yellow zone. The displayed non-scalable baseline timing results are not cumulative.
Figure 5: Visualization of S3DIS results for RESSCAL3D. The input data and semantic prediction are visualized on the top and bottom row, respectively. Non-cumulative time is used.

RESSCAL3D: Resolution Scalable 3D Semantic Segmentation of Point Clouds

TL;DR

Abstract

RESSCAL3D: Resolution Scalable 3D Semantic Segmentation of Point Clouds

Authors

TL;DR

Abstract

Table of Contents

Figures (5)