Table of Contents
Fetching ...

Geometric Prior Based Deep Human Point Cloud Geometry Compression

Xinju Wu, Pingping Zhang, Meng Wang, Peilin Chen, Shiqi Wang, Sam Kwong

TL;DR

This work introduces a geometric-prior based deep compression framework for high-resolution human point clouds by leveraging a parametric 3D prior (SMPL-style) to initialize geometry with a compact set of parameters and then encoding residual feature deviations through warping and entropy modeling. The method comprises a two-stage process: (i) geometric-prior representation to generate an aligned reference, and (ii) residual feature extraction and compression with feature warping, enabling plug-and-play integration with existing PCC pipelines. Empirical results across multiple datasets (including humans and animals) show large BD-Rate improvements over traditional codecs (G-PCC, V-PCC) and learning-based baselines (PCGC, PCGCv2), with notable PSNR gains and qualitative improvements in local geometry detail. The approach demonstrates strong generalization to different geometries and resolutions, suggesting significant practical impact for realistic digital avatars in XR/metaverse contexts.

Abstract

The emergence of digital avatars has raised an exponential increase in the demand for human point clouds with realistic and intricate details. The compression of such data becomes challenging with overwhelming data amounts comprising millions of points. Herein, we leverage the human geometric prior in geometry redundancy removal of point clouds, greatly promoting the compression performance. More specifically, the prior provides topological constraints as geometry initialization, allowing adaptive adjustments with a compact parameter set that could be represented with only a few bits. Therefore, we can envisage high-resolution human point clouds as a combination of geometric priors and structural deviations. The priors could first be derived with an aligned point cloud, and subsequently the difference of features is compressed into a compact latent code. The proposed framework can operate in a play-and-plug fashion with existing learning based point cloud compression methods. Extensive experimental results show that our approach significantly improves the compression performance without deteriorating the quality, demonstrating its promise in a variety of applications.

Geometric Prior Based Deep Human Point Cloud Geometry Compression

TL;DR

This work introduces a geometric-prior based deep compression framework for high-resolution human point clouds by leveraging a parametric 3D prior (SMPL-style) to initialize geometry with a compact set of parameters and then encoding residual feature deviations through warping and entropy modeling. The method comprises a two-stage process: (i) geometric-prior representation to generate an aligned reference, and (ii) residual feature extraction and compression with feature warping, enabling plug-and-play integration with existing PCC pipelines. Empirical results across multiple datasets (including humans and animals) show large BD-Rate improvements over traditional codecs (G-PCC, V-PCC) and learning-based baselines (PCGC, PCGCv2), with notable PSNR gains and qualitative improvements in local geometry detail. The approach demonstrates strong generalization to different geometries and resolutions, suggesting significant practical impact for realistic digital avatars in XR/metaverse contexts.

Abstract

The emergence of digital avatars has raised an exponential increase in the demand for human point clouds with realistic and intricate details. The compression of such data becomes challenging with overwhelming data amounts comprising millions of points. Herein, we leverage the human geometric prior in geometry redundancy removal of point clouds, greatly promoting the compression performance. More specifically, the prior provides topological constraints as geometry initialization, allowing adaptive adjustments with a compact parameter set that could be represented with only a few bits. Therefore, we can envisage high-resolution human point clouds as a combination of geometric priors and structural deviations. The priors could first be derived with an aligned point cloud, and subsequently the difference of features is compressed into a compact latent code. The proposed framework can operate in a play-and-plug fashion with existing learning based point cloud compression methods. Extensive experimental results show that our approach significantly improves the compression performance without deteriorating the quality, demonstrating its promise in a variety of applications.
Paper Structure (32 sections, 9 equations, 15 figures, 5 tables)

This paper contains 32 sections, 9 equations, 15 figures, 5 tables.

Figures (15)

  • Figure 1: Comparisons of human point cloud geometry compression paradigms. Existing approaches directly compress the source point cloud and transmit (a) voxelwise features or (b) pointwise coordinates and features. (c) The proposed scheme incorporates a geometric prior to remove the redundancy at the feature level, followed by residual feature compression, yielding better compression performance.
  • Figure 2: Visual quality comparisons of (a) a source point cloud, (b) intermediate 3D models generated by our approach, and (c) reconstructed point clouds at 0.125 bits per point (bpp) for our approach and 0.152 bpp for PCGCv2 wang2021multiscale.
  • Figure 3: Overview of our proposed framework that involves a two-stage process for geometric prior representation and feature residual compression. Given a source point cloud $\mathbf{S}$, we first regress an aligned mesh $\mathbf{T}$ that can be driven by a set of parameters from a deformable template mesh $\bar{\mathbf{T}}$. During encoding, these parameters are further quantized into a compact bitstream, allowing for the manipulation of the template mesh's pose and shape during decoding. Regarding the next stage, we extract features from both the source point cloud and an aligned point cloud based on the sparse tensors that comprise coordinates and features. We then warp the features of the aligned point cloud onto the coordinates of the source point cloud, subsequently calculating residual features. These residual features are further encoded with guidance from an entropy model. The decoder, situated at the lower part of the framework, processes bitstreams to initiate the decoding process.
  • Figure 4: The network structure of (a) feature extraction, (b) feature warping, and (c) feature propagation modules. The input of the feature extraction module can be coordinates of the source point cloud $\mathbf{C}_\mathbf{S}$ or the aligned point cloud $\mathbf{C}_\mathbf{T}$. "Conv/2$\downarrow$" and "Conv/2$\uparrow$" represent the convolution and transposed convolution operations, respectively, with a stride of 2. "Conv on Coords" convolves on target coordinates using a generalized transposed sparse convolution layer SparseConvNetakhtar2022interframe. We consider an example with three scales, where $L=3$.
  • Figure 5: The 2D illustration of (a) vanilla sparse convolution and (a) the layer of convolution on coordinates.
  • ...and 10 more figures