Table of Contents
Fetching ...

PVContext: Hybrid Context Model for Point Cloud Compression

Guoqing Zhang, Wenbo Zhao, Jian Liu, Yuanchao Bai, Junjun Jiang, Xianming Liu

TL;DR

PVContext addresses the bottleneck of context size in octree-based point cloud compression by uniting two complementary modalities: a local Voxel Context and a global Point Context. The approach feeds a dual-encoder, single-decoder entropy model that predicts node occupancy, with a cross-entropy objective guiding optimization. Empirically, PVContext yields notable bitrate reductions across both LiDAR and object point clouds, outperforming G-PCC and OctAttention on SemanticKITTI, MVUB, and MPEG 8i datasets, while ablations confirm the complementary value of combining voxel and point cues. This hybrid context framework enables more efficient, scalable point cloud compression with robust performance across varying geometry precisions and data regimes.

Abstract

Efficient storage of large-scale point cloud data has become increasingly challenging due to advancements in scanning technology. Recent deep learning techniques have revolutionized this field; However, most existing approaches rely on single-modality contexts, such as octree nodes or voxel occupancy, limiting their ability to capture information across large regions. In this paper, we propose PVContext, a hybrid context model for effective octree-based point cloud compression. PVContext comprises two components with distinct modalities: the Voxel Context, which accurately represents local geometric information using voxels, and the Point Context, which efficiently preserves global shape information from point clouds. By integrating these two contexts, we retain detailed information across large areas while controlling the context size. The combined context is then fed into a deep entropy model to accurately predict occupancy. Experimental results demonstrate that, compared to G-PCC, our method reduces the bitrate by 37.95\% on SemanticKITTI LiDAR point clouds and by 48.98\% and 36.36\% on dense object point clouds from MPEG 8i and MVUB, respectively.

PVContext: Hybrid Context Model for Point Cloud Compression

TL;DR

PVContext addresses the bottleneck of context size in octree-based point cloud compression by uniting two complementary modalities: a local Voxel Context and a global Point Context. The approach feeds a dual-encoder, single-decoder entropy model that predicts node occupancy, with a cross-entropy objective guiding optimization. Empirically, PVContext yields notable bitrate reductions across both LiDAR and object point clouds, outperforming G-PCC and OctAttention on SemanticKITTI, MVUB, and MPEG 8i datasets, while ablations confirm the complementary value of combining voxel and point cues. This hybrid context framework enables more efficient, scalable point cloud compression with robust performance across varying geometry precisions and data regimes.

Abstract

Efficient storage of large-scale point cloud data has become increasingly challenging due to advancements in scanning technology. Recent deep learning techniques have revolutionized this field; However, most existing approaches rely on single-modality contexts, such as octree nodes or voxel occupancy, limiting their ability to capture information across large regions. In this paper, we propose PVContext, a hybrid context model for effective octree-based point cloud compression. PVContext comprises two components with distinct modalities: the Voxel Context, which accurately represents local geometric information using voxels, and the Point Context, which efficiently preserves global shape information from point clouds. By integrating these two contexts, we retain detailed information across large areas while controlling the context size. The combined context is then fed into a deep entropy model to accurately predict occupancy. Experimental results demonstrate that, compared to G-PCC, our method reduces the bitrate by 37.95\% on SemanticKITTI LiDAR point clouds and by 48.98\% and 36.36\% on dense object point clouds from MPEG 8i and MVUB, respectively.
Paper Structure (14 sections, 5 equations, 3 figures, 2 tables)

This paper contains 14 sections, 5 equations, 3 figures, 2 tables.

Figures (3)

  • Figure 1: The overview of our method. The input point cloud is first processed using an octree. To predict the occupancy state of the current node (blue), we form its precursor encoded nodes (orange) as Voxel Context, and the neighbor points (light blue) of its parent node (red) as Red Context. These context are then fed to a encoder-decoder based network, which predicts the occupancy probability of the current node. Finally, arithmetic encoding is used to compress the octree into a compressed bitstream based on the estimated state distribution.
  • Figure 2: Results of different methods on SemanticKITTI at different bitrates.
  • Figure 3: Performance at different geometry precision.