Table of Contents
Fetching ...

Voxel-based Point Cloud Geometry Compression with Space-to-Channel Context

Bojun Liu, Yangzhi Ma, Ao Luo, Li Li, Dong Liu

TL;DR

This work addresses the inefficiencies of voxel-based point cloud geometry compression, particularly the limited receptive field at high bit depths. It introduces stage-wise Space-to-Channel (S2C) context modeling for dense and low-level sparse data and a level-wise S2C framework with Geometry Residual Coding (GRC) and Residual Probability Approximation (RPA) for high-level sparse data, aided by a spherical coordinate representation. By transforming spatial expansion into channel expansion, the method expands the receptive field without upsampling, achieving better bit-rate savings and lower encoding/decoding complexity than state-of-the-art voxel-based approaches. Experimental results across dense and sparse datasets validate the effectiveness and efficiency of S2C, highlighting practical improvements for real-world point cloud compression tasks.

Abstract

Voxel-based methods are among the most efficient for point cloud geometry compression, particularly with dense point clouds. However, they face limitations due to a restricted receptive field, especially when handling high-bit depth point clouds. To overcome this issue, we introduce a stage-wise Space-to-Channel (S2C) context model for both dense point clouds and low-level sparse point clouds. This model utilizes a channel-wise autoregressive strategy to effectively integrate neighborhood information at a coarse resolution. For high-level sparse point clouds, we further propose a level-wise S2C context model that addresses resolution limitations by incorporating Geometry Residual Coding (GRC) for consistent-resolution cross-level prediction. Additionally, we use the spherical coordinate system for its compact representation and enhance our GRC approach with a Residual Probability Approximation (RPA) module, which features a large kernel size. Experimental results show that our S2C context model not only achieves bit savings while maintaining or improving reconstruction quality but also reduces computational complexity compared to state-of-the-art voxel-based compression methods.

Voxel-based Point Cloud Geometry Compression with Space-to-Channel Context

TL;DR

This work addresses the inefficiencies of voxel-based point cloud geometry compression, particularly the limited receptive field at high bit depths. It introduces stage-wise Space-to-Channel (S2C) context modeling for dense and low-level sparse data and a level-wise S2C framework with Geometry Residual Coding (GRC) and Residual Probability Approximation (RPA) for high-level sparse data, aided by a spherical coordinate representation. By transforming spatial expansion into channel expansion, the method expands the receptive field without upsampling, achieving better bit-rate savings and lower encoding/decoding complexity than state-of-the-art voxel-based approaches. Experimental results across dense and sparse datasets validate the effectiveness and efficiency of S2C, highlighting practical improvements for real-world point cloud compression tasks.

Abstract

Voxel-based methods are among the most efficient for point cloud geometry compression, particularly with dense point clouds. However, they face limitations due to a restricted receptive field, especially when handling high-bit depth point clouds. To overcome this issue, we introduce a stage-wise Space-to-Channel (S2C) context model for both dense point clouds and low-level sparse point clouds. This model utilizes a channel-wise autoregressive strategy to effectively integrate neighborhood information at a coarse resolution. For high-level sparse point clouds, we further propose a level-wise S2C context model that addresses resolution limitations by incorporating Geometry Residual Coding (GRC) for consistent-resolution cross-level prediction. Additionally, we use the spherical coordinate system for its compact representation and enhance our GRC approach with a Residual Probability Approximation (RPA) module, which features a large kernel size. Experimental results show that our S2C context model not only achieves bit savings while maintaining or improving reconstruction quality but also reduces computational complexity compared to state-of-the-art voxel-based compression methods.

Paper Structure

This paper contains 21 sections, 3 equations, 6 figures, 6 tables.

Figures (6)

  • Figure 1: The bit rate consumption across each level of the voxel-based method wang2022sparse on the Ford dataset. The bit rate for the initial few levels is approximately 0, while the bit rate for the highest levels tends to stabilize at 3 bits per point.
  • Figure 2: Top: The architecture of stage-wise Space-to-Channel (S2C) context model from level $i$ to level $i+1$. The model bypasses upsampling by converting the sub-space of level $i+1$ to the channel of level $i$. At each stage, the occupancy probability for each sub-voxel is predicted and encoded, allowing the voxels at level $i+1$ to be reconstructed from the decoded feature channels of level $i$ through the feature-to-point module. The symbol $*$ denotes that the corresponding module employs parameter sharing across all stages. Bottom: The architecture of level-wise Space-to-Channel (S2C) context model from level $j$ to level $j+2$. The residual position probability of level $j+1$ relative to level $j$ is predicted using a Residual Probability Approximation (RPA) model. Based on the predicted probability distribution, the relative residual position is encoded and decoded. The same iterative process is then applied from Level $j+1$ to $j+2$. The red boxes denote the resolution of voxels at each level. The receptive field of sparse convolution remains constant throughout the inference process in our proposed S2C context model.
  • Figure 3: The number of points across levels in Cartesian and Spherical coordinate systems for the KITTI behley2019semantickitti and Ford pandey2011ford datasets. The Cartesian coordinate space exhibits a maximum of 18 levels for both KITTI and Ford datasets. In contrast, the Spherical coordinate space reaches a maximum of 16 levels for KITTI and 17 for Ford.
  • Figure 4: Left: The illustration of the proposed Residual Probability Approximation (RPA) model. The model begins by analyzing the input sparse tensor with the Large-scale deep feature aggregation module. It then divides the tensor into two groups (Group 1 and Group 2). The first group's features are used to estimate the probability distribution of its voxel residuals. The second group's probability distribution is predicted using both its own features and the encoded information from the first group's voxels. Middle: Overview of the Large-scale Deep Feature Aggregation (L-DFA) architecture. This module employs a large kernel and dilated convolutions to broaden the receptive field, building upon the DFA module used in wang2022sparse. Right: The architecture of Inception-ResNet (IRN), which is the basic module of our proposed L-DFA.
  • Figure 5: Quantitative rate-distortion results of our proposed level-wise S2C context model on KITTI and Ford datasets. The baselines are voxel-based methods SparsePCGC wang2022sparse, Spher-SparsePCGC, and traditional codec G-PCC g-pccmpeg.
  • ...and 1 more figures