Table of Contents
Fetching ...

Pointsoup: High-Performance and Extremely Low-Decoding-Latency Learned Geometry Codec for Large-Scale Point Cloud Scenes

Kang You, Kai Liu, Li Yu, Pan Gao, Dandan Ding

TL;DR

Pointsoup tackles the challenge of compressing large-scale, sparse point cloud geometry while achieving extremely low decoding latency. It introduces a point-based pipeline with aligned window down-sampling (AWDS), dilated window-based entropy modeling (DWEM), and dilated window-based up-sampling (DWUS), enabling a single lightweight model for variable-rate control. Empirical results on indoor/outdoor benchmarks show state-of-the-art rate-distortion with substantial decoding-time reductions up to 90$ extsim$160$ imes$ faster than the G-PCCv23 Trisoup decoder and a compact 2.9 MB model. The approach generalizes well across domains and supports real-time, million-scale decoding, with potential benefits for downstream tasks in the compressed domain.

Abstract

Despite considerable progress being achieved in point cloud geometry compression, there still remains a challenge in effectively compressing large-scale scenes with sparse surfaces. Another key challenge lies in reducing decoding latency, a crucial requirement in real-world application. In this paper, we propose Pointsoup, an efficient learning-based geometry codec that attains high-performance and extremely low-decoding-latency simultaneously. Inspired by conventional Trisoup codec, a point model-based strategy is devised to characterize local surfaces. Specifically, skin features are embedded from local windows via an attention-based encoder, and dilated windows are introduced as cross-scale priors to infer the distribution of quantized features in parallel. During decoding, features undergo fast refinement, followed by a folding-based point generator that reconstructs point coordinates with fairly fast speed. Experiments show that Pointsoup achieves state-of-the-art performance on multiple benchmarks with significantly lower decoding complexity, i.e., up to 90$\sim$160$\times$ faster than the G-PCCv23 Trisoup decoder on a comparatively low-end platform (e.g., one RTX 2080Ti). Furthermore, it offers variable-rate control with a single neural model (2.9MB), which is attractive for industrial practitioners.

Pointsoup: High-Performance and Extremely Low-Decoding-Latency Learned Geometry Codec for Large-Scale Point Cloud Scenes

TL;DR

Pointsoup tackles the challenge of compressing large-scale, sparse point cloud geometry while achieving extremely low decoding latency. It introduces a point-based pipeline with aligned window down-sampling (AWDS), dilated window-based entropy modeling (DWEM), and dilated window-based up-sampling (DWUS), enabling a single lightweight model for variable-rate control. Empirical results on indoor/outdoor benchmarks show state-of-the-art rate-distortion with substantial decoding-time reductions up to 90160 faster than the G-PCCv23 Trisoup decoder and a compact 2.9 MB model. The approach generalizes well across domains and supports real-time, million-scale decoding, with potential benefits for downstream tasks in the compressed domain.

Abstract

Despite considerable progress being achieved in point cloud geometry compression, there still remains a challenge in effectively compressing large-scale scenes with sparse surfaces. Another key challenge lies in reducing decoding latency, a crucial requirement in real-world application. In this paper, we propose Pointsoup, an efficient learning-based geometry codec that attains high-performance and extremely low-decoding-latency simultaneously. Inspired by conventional Trisoup codec, a point model-based strategy is devised to characterize local surfaces. Specifically, skin features are embedded from local windows via an attention-based encoder, and dilated windows are introduced as cross-scale priors to infer the distribution of quantized features in parallel. During decoding, features undergo fast refinement, followed by a folding-based point generator that reconstructs point coordinates with fairly fast speed. Experiments show that Pointsoup achieves state-of-the-art performance on multiple benchmarks with significantly lower decoding complexity, i.e., up to 90160 faster than the G-PCCv23 Trisoup decoder on a comparatively low-end platform (e.g., one RTX 2080Ti). Furthermore, it offers variable-rate control with a single neural model (2.9MB), which is attractive for industrial practitioners.
Paper Structure (25 sections, 13 equations, 12 figures, 5 tables)

This paper contains 25 sections, 13 equations, 12 figures, 5 tables.

Figures (12)

  • Figure 1: Quantitative compression results of proposed Pointsoup and G-PCCv23. Colors are rendered by nearest mapping. The "conferenceRoom_1" in S3DIS Area 6 is used as an example, which has 1,067,709 points. Our method allows for the decoding of a million-scale point cloud geometry in 39 ms with only one RTX 2080Ti GPU while guaranteeing superior visual quality.
  • Figure 2: Pointsoup workflow. AWDS refers to the Aligned Window-based Down-Sampling module; DWEM denotes the Dilated Window-based Entropy Modeling module; DWUS represents the Dilated Window-based Up-Sampling module; AE and AD are for arithmetic encoding and decoding; Q denotes quantization.
  • Figure 3: Aligned Window-based Down-Sampling (AWDS) module.
  • Figure 4: Attention-based aggregation of AWDS module. The self-attention block is presented in the dotted line.
  • Figure 5: Dilated window-based entropy modeling. The dilated window, obtained by computing the $k$ nearest neighbors upon down-sampled bones, is introduced as a cross-scale prior.
  • ...and 7 more figures