Table of Contents
Fetching ...

PIVOT-Net: Heterogeneous Point-Voxel-Tree-based Framework for Point Cloud Compression

Jiahao Pang, Kevin Bui, Dong Tian

TL;DR

PIVOT-Net tackles the challenge of compressing point clouds across varying bit-depths by unifying point-, voxel-, and tree-based representations within a single learning-based framework. It assigns coarse bits to tree coding, middle bits to voxel-domain processing with context-aware upsampling and anEnhanced Voxel Transformer, and fine bits to point-based networks, enabling RD-optimized reconstruction. The approach delivers state-of-the-art results on diverse datasets, including solid, dense, sparse, and LiDAR point clouds, and demonstrates clear gains over baselines like G-PCC, GRASP-Net, and SparsePCGC. This heterogeneous framework offers practical benefits for efficient, scalable PCC across real-world applications with different sparsity and detail characteristics.

Abstract

The universality of the point cloud format enables many 3D applications, making the compression of point clouds a critical phase in practice. Sampled as discrete 3D points, a point cloud approximates 2D surface(s) embedded in 3D with a finite bit-depth. However, the point distribution of a practical point cloud changes drastically as its bit-depth increases, requiring different methodologies for effective consumption/analysis. In this regard, a heterogeneous point cloud compression (PCC) framework is proposed. We unify typical point cloud representations -- point-based, voxel-based, and tree-based representations -- and their associated backbones under a learning-based framework to compress an input point cloud at different bit-depth levels. Having recognized the importance of voxel-domain processing, we augment the framework with a proposed context-aware upsampling for decoding and an enhanced voxel transformer for feature aggregation. Extensive experimentation demonstrates the state-of-the-art performance of our proposal on a wide range of point clouds.

PIVOT-Net: Heterogeneous Point-Voxel-Tree-based Framework for Point Cloud Compression

TL;DR

PIVOT-Net tackles the challenge of compressing point clouds across varying bit-depths by unifying point-, voxel-, and tree-based representations within a single learning-based framework. It assigns coarse bits to tree coding, middle bits to voxel-domain processing with context-aware upsampling and anEnhanced Voxel Transformer, and fine bits to point-based networks, enabling RD-optimized reconstruction. The approach delivers state-of-the-art results on diverse datasets, including solid, dense, sparse, and LiDAR point clouds, and demonstrates clear gains over baselines like G-PCC, GRASP-Net, and SparsePCGC. This heterogeneous framework offers practical benefits for efficient, scalable PCC across real-world applications with different sparsity and detail characteristics.

Abstract

The universality of the point cloud format enables many 3D applications, making the compression of point clouds a critical phase in practice. Sampled as discrete 3D points, a point cloud approximates 2D surface(s) embedded in 3D with a finite bit-depth. However, the point distribution of a practical point cloud changes drastically as its bit-depth increases, requiring different methodologies for effective consumption/analysis. In this regard, a heterogeneous point cloud compression (PCC) framework is proposed. We unify typical point cloud representations -- point-based, voxel-based, and tree-based representations -- and their associated backbones under a learning-based framework to compress an input point cloud at different bit-depth levels. Having recognized the importance of voxel-domain processing, we augment the framework with a proposed context-aware upsampling for decoding and an enhanced voxel transformer for feature aggregation. Extensive experimentation demonstrates the state-of-the-art performance of our proposal on a wide range of point clouds.
Paper Structure (14 sections, 3 equations, 9 figures, 2 tables)

This paper contains 14 sections, 3 equations, 9 figures, 2 tables.

Figures (9)

  • Figure 1: Several 3D point clouds at different bit-depth levels.
  • Figure 2: Comparisons of lossy PCC frameworks utilizing different point cloud representations.
  • Figure 3: Architecture of our PIVOT-Net. The orange blocks contain learnable neural network layers, where point analysis/synthesis are point-based neural networks while voxel and feature analysis/synthesis are sparse CNNs.
  • Figure 4: Context-aware upsampling for adaptive voxel synthesis. Learnable modules are colored in yellow.
  • Figure 5: Enhanced Voxel Transformer (left) and its self-attention module (right). Learnable modules are colored in yellow.
  • ...and 4 more figures