Table of Contents
Fetching ...

DiffCom: Decoupled Sparse Priors Guided Diffusion Compression for Point Clouds

Xiaoge Zhang, Zijie Wu, Mehwish Nasim, Mingtao Feng, Saeed Anwar, Ajmal Mian

TL;DR

DiffCom tackles lossy point-cloud compression by decoupling the latent representations used for reconstruction from the sparse priors used for storage, via a dual-density data flow. Dense latents ($X_l$, $F_l$) are recovered from sparse priors ($X_s$, $F_s$) through a Gaussian Mixture Model representation and a Probabilistic Attention-based Conditional Denoiser (PACD) guided diffusion, with a context-aware entropy model to encode the priors efficiently. A diffusion-based reconstruction, conditioned on priors, yields high-fidelity latent recovery while the context-aware entropy model provides efficient binary coding, achieving superior rate-distortion on ShapeNet and MPEG PCC benchmarks. The approach delivers strong PSNR gains at very low bitrates, reduced decoding time with fewer diffusion steps, and a flexible framework that separates reconstruction and storage pathways for scalable point-cloud compression.

Abstract

Lossy compression relies on an autoencoder to transform a point cloud into latent points for storage, leaving the inherent redundancy of latent representations unexplored. To reduce redundancy in latent points, we propose a diffusion-based framework guided by sparse priors that achieves high reconstruction quality, especially at low bitrates. Our approach features an efficient dual-density data flow that relaxes size constraints on latent points. It hybridizes a probabilistic conditional diffusion model to encapsulate essential details for reconstruction within sparse priors, which are decoupled hierarchically into intra- and inter-point priors. Specifically, our DiffCom encodes the original point cloud into latent points and decoupled sparse priors through separate encoders. To dynamically attend to geometric and semantic cues from the priors at each encoding and decoding layer, we employ an attention-guided latent denoiser conditioned on the decoupled priors. Additionally, we integrate the local distribution into the arithmetic encoder and decoder to enhance local context modeling of the sparse points. The original point cloud is reconstructed through a point decoder. Compared to state-of-the-art methods, our approach achieves a superior rate-distortion trade-off, as evidenced by extensive evaluations on the ShapeNet dataset and standard test datasets from the MPEG PCC Group.

DiffCom: Decoupled Sparse Priors Guided Diffusion Compression for Point Clouds

TL;DR

DiffCom tackles lossy point-cloud compression by decoupling the latent representations used for reconstruction from the sparse priors used for storage, via a dual-density data flow. Dense latents (, ) are recovered from sparse priors (, ) through a Gaussian Mixture Model representation and a Probabilistic Attention-based Conditional Denoiser (PACD) guided diffusion, with a context-aware entropy model to encode the priors efficiently. A diffusion-based reconstruction, conditioned on priors, yields high-fidelity latent recovery while the context-aware entropy model provides efficient binary coding, achieving superior rate-distortion on ShapeNet and MPEG PCC benchmarks. The approach delivers strong PSNR gains at very low bitrates, reduced decoding time with fewer diffusion steps, and a flexible framework that separates reconstruction and storage pathways for scalable point-cloud compression.

Abstract

Lossy compression relies on an autoencoder to transform a point cloud into latent points for storage, leaving the inherent redundancy of latent representations unexplored. To reduce redundancy in latent points, we propose a diffusion-based framework guided by sparse priors that achieves high reconstruction quality, especially at low bitrates. Our approach features an efficient dual-density data flow that relaxes size constraints on latent points. It hybridizes a probabilistic conditional diffusion model to encapsulate essential details for reconstruction within sparse priors, which are decoupled hierarchically into intra- and inter-point priors. Specifically, our DiffCom encodes the original point cloud into latent points and decoupled sparse priors through separate encoders. To dynamically attend to geometric and semantic cues from the priors at each encoding and decoding layer, we employ an attention-guided latent denoiser conditioned on the decoupled priors. Additionally, we integrate the local distribution into the arithmetic encoder and decoder to enhance local context modeling of the sparse points. The original point cloud is reconstructed through a point decoder. Compared to state-of-the-art methods, our approach achieves a superior rate-distortion trade-off, as evidenced by extensive evaluations on the ShapeNet dataset and standard test datasets from the MPEG PCC Group.

Paper Structure

This paper contains 14 sections, 13 equations, 12 figures, 6 tables.

Figures (12)

  • Figure 1: Improvements achieved by the proposed method. Errors beyond the maximum threshold are truncated. All errors are computed from the ground truth to the reconstructed point cloud and visualized on the ground truth point cloud.
  • Figure 2: (a) Conventional compressors code the latent representation of points directly via a naive context-free entropy model. (b) Our method codes the sparse priors via a context-aware entropy model. It employs a two-stage data flow to compress the points further into decoupled sparse priors. It incorporates a Probabilistic attention-based conditional diffusion model to denoise the latent representations conditioned on the sparse priors.
  • Figure 3: Overview of our decoupled sparse priors guided diffusion compression model for point cloud (DiffCom). Instead of directly encoding the input point cloud into latent points and features, we utilize a sparse point encoder to extract a sparser representations. The sparser representation are decoupled into sparse points and intra-point local distributions. During decompression, we begin with Gaussian noise and apply a Probabilistic attention-based conditional denoiser (PACD) conditioned on the sparse priors, reconstructing the latent representation. These reconstructed latents are then decoded to produce a high-quality point cloud.
  • Figure 4: Network architecture of PCA-Denoiser. The model adopts a U-Net–like design, where the encoder consists of fusion and set abstraction (SA) modules, and the decoder comprises feature propagation (FP) and fusion modules. An MLP layer maps the decoded features to the target dimension. Time embeddings are injected into both the encoder and decoder to provide timestep-dependent guidance throughout the denoising process.
  • Figure 5: Architecture of the downsampling and upsampling blocks. In the downsampling block, points are downsampled using a set abstraction (SA) module, and features are enriched with a PointTransformer layer. In the upsampling block, residual point coordinates are reconstructed by learning scales and weights over predefined directional hypotheses, while residual features are recovered through UpConv operations.
  • ...and 7 more figures