Table of Contents
Fetching ...

Diff-PCC: Diffusion-based Neural Compression for 3D Point Clouds

Kai Liu, Kang You, Pan Gao

TL;DR

This work introduces Diff-PCC, a diffusion-based lossy point cloud compression framework that employs a dual-space latent encoding to extract complementary low- and high-frequency shape information and a diffusion-based generator that denoises noisy point clouds under the guidance of these latents. By integrating a hyperprior-driven rate model and a rate-distortion objective, Diff-PCC achieves superior compression performance compared to G-PCC and recent learning-based methods, with substantial BD-PSNR gains at ultra-low bitrates and improved perceptual quality. The method advances neural point cloud compression by addressing the limitations of Gaussian priors in VAEs and leveraging diffusion models for high-fidelity reconstruction, while also introducing architecture elements like AdaLN-based conditioning and cross-frequency feature fusion. Although effective, the approach incurs higher coding complexity and currently targets smaller-scale point clouds, pointing to future work in acceleration and scalability for broader 3D workloads.

Abstract

Stable diffusion networks have emerged as a groundbreaking development for their ability to produce realistic and detailed visual content. This characteristic renders them ideal decoders, capable of producing high-quality and aesthetically pleasing reconstructions. In this paper, we introduce the first diffusion-based point cloud compression method, dubbed Diff-PCC, to leverage the expressive power of the diffusion model for generative and aesthetically superior decoding. Different from the conventional autoencoder fashion, a dual-space latent representation is devised in this paper, in which a compressor composed of two independent encoding backbones is considered to extract expressive shape latents from distinct latent spaces. At the decoding side, a diffusion-based generator is devised to produce high-quality reconstructions by considering the shape latents as guidance to stochastically denoise the noisy point clouds. Experiments demonstrate that the proposed Diff-PCC achieves state-of-the-art compression performance (e.g., 7.711 dB BD-PSNR gains against the latest G-PCC standard at ultra-low bitrate) while attaining superior subjective quality. Source code will be made publicly available.

Diff-PCC: Diffusion-based Neural Compression for 3D Point Clouds

TL;DR

This work introduces Diff-PCC, a diffusion-based lossy point cloud compression framework that employs a dual-space latent encoding to extract complementary low- and high-frequency shape information and a diffusion-based generator that denoises noisy point clouds under the guidance of these latents. By integrating a hyperprior-driven rate model and a rate-distortion objective, Diff-PCC achieves superior compression performance compared to G-PCC and recent learning-based methods, with substantial BD-PSNR gains at ultra-low bitrates and improved perceptual quality. The method advances neural point cloud compression by addressing the limitations of Gaussian priors in VAEs and leveraging diffusion models for high-fidelity reconstruction, while also introducing architecture elements like AdaLN-based conditioning and cross-frequency feature fusion. Although effective, the approach incurs higher coding complexity and currently targets smaller-scale point clouds, pointing to future work in acceleration and scalability for broader 3D workloads.

Abstract

Stable diffusion networks have emerged as a groundbreaking development for their ability to produce realistic and detailed visual content. This characteristic renders them ideal decoders, capable of producing high-quality and aesthetically pleasing reconstructions. In this paper, we introduce the first diffusion-based point cloud compression method, dubbed Diff-PCC, to leverage the expressive power of the diffusion model for generative and aesthetically superior decoding. Different from the conventional autoencoder fashion, a dual-space latent representation is devised in this paper, in which a compressor composed of two independent encoding backbones is considered to extract expressive shape latents from distinct latent spaces. At the decoding side, a diffusion-based generator is devised to produce high-quality reconstructions by considering the shape latents as guidance to stochastically denoise the noisy point clouds. Experiments demonstrate that the proposed Diff-PCC achieves state-of-the-art compression performance (e.g., 7.711 dB BD-PSNR gains against the latest G-PCC standard at ultra-low bitrate) while attaining superior subjective quality. Source code will be made publicly available.
Paper Structure (20 sections, 15 equations, 4 figures, 2 tables)

This paper contains 20 sections, 15 equations, 4 figures, 2 tables.

Figures (4)

  • Figure 1: Diff-PCC pipeline. $X_{t}$ and $\bar{X}_{t}$ represents the $t$th original point cloud and noisy point cloud, respectively; $p$ refers to the forward process and $q$ refers to the reverse process; $N(0,\boldsymbol{I})$ means the pure noise. Entropy model and arithmetic coding is omitted for a concise explanation.
  • Figure 2: Detailed Structure of the Utilized Compressor and Generator. $y_l$ and $y_h$ refer to the low-frequency shape latent and high-frequency detail latent, respectively; $z$ means hyperprior latent; $Q$ refers to the quantization; AE and AD represents the arithmetic encoding and decoding.
  • Figure 3: Rate-distortion curves for performance comparison. From left to right: ShapeNet, ModelNet10, and ModelNet40 dataset.
  • Figure 4: Subjective quality comparison. Example point clouds are selected from the ShapeNet dataset, each with 2k points.