PIVOT-Net: Heterogeneous Point-Voxel-Tree-based Framework for Point Cloud Compression
Jiahao Pang, Kevin Bui, Dong Tian
TL;DR
PIVOT-Net tackles the challenge of compressing point clouds across varying bit-depths by unifying point-, voxel-, and tree-based representations within a single learning-based framework. It assigns coarse bits to tree coding, middle bits to voxel-domain processing with context-aware upsampling and anEnhanced Voxel Transformer, and fine bits to point-based networks, enabling RD-optimized reconstruction. The approach delivers state-of-the-art results on diverse datasets, including solid, dense, sparse, and LiDAR point clouds, and demonstrates clear gains over baselines like G-PCC, GRASP-Net, and SparsePCGC. This heterogeneous framework offers practical benefits for efficient, scalable PCC across real-world applications with different sparsity and detail characteristics.
Abstract
The universality of the point cloud format enables many 3D applications, making the compression of point clouds a critical phase in practice. Sampled as discrete 3D points, a point cloud approximates 2D surface(s) embedded in 3D with a finite bit-depth. However, the point distribution of a practical point cloud changes drastically as its bit-depth increases, requiring different methodologies for effective consumption/analysis. In this regard, a heterogeneous point cloud compression (PCC) framework is proposed. We unify typical point cloud representations -- point-based, voxel-based, and tree-based representations -- and their associated backbones under a learning-based framework to compress an input point cloud at different bit-depth levels. Having recognized the importance of voxel-domain processing, we augment the framework with a proposed context-aware upsampling for decoding and an enhanced voxel transformer for feature aggregation. Extensive experimentation demonstrates the state-of-the-art performance of our proposal on a wide range of point clouds.
