Bits-to-Photon: End-to-End Learned Scalable Point Cloud Compression for Direct Rendering

Yueyu Hu; Ran Gong; Yao Wang

Bits-to-Photon: End-to-End Learned Scalable Point Cloud Compression for Direct Rendering

Yueyu Hu, Ran Gong, Yao Wang

TL;DR

Bits-to-Photon (B2P) presents an end-to-end learned point cloud compression framework that directly decodes to renderable 3D Gaussians via a differentiable Gaussian splatting renderer. It employs an octree-based, multi-resolution pipeline with conditional entropy coding, geometry-invariant 3D sparse convolutions, and a predictive Gaussian generation module to achieve scalable detail from level $L$ to $N$, rendering at level $M$ ($L \,\le\, M \,\le\, N$). The encoder–decoder are trained with a rate–distortion objective that jointly optimizes bit-rate and rendering quality (PSNR, LPIPS, MS-SSIM) across multiple scalable points, outperforming G-PCC and learned baselines in rendering fidelity at similar bit-rates while reducing decoding latency. The approach enables real-time color decoding and rendering for interactive 3D streaming on THuman 2.0 and 8iVFB, and suggests promising avenues for temporal extension and region-adaptive coding to further enhance scalability and visual quality.

Abstract

Point cloud is a promising 3D representation for volumetric streaming in emerging AR/VR applications. Despite recent advances in point cloud compression, decoding and rendering high-quality images from lossy compressed point clouds is still challenging in terms of quality and complexity, making it a major roadblock to achieve real-time 6-Degree-of-Freedom video streaming. In this paper, we address this problem by developing a point cloud compression scheme that generates a bit stream that can be directly decoded to renderable 3D Gaussians. The encoder and decoder are jointly optimized to consider both bit-rates and rendering quality. It significantly improves the rendering quality while substantially reducing decoding and rendering time, compared to existing point cloud compression methods. Furthermore, the proposed scheme generates a scalable bit stream, allowing multiple levels of details at different bit-rate ranges. Our method supports real-time color decoding and rendering of high quality point clouds, thus paving the way for interactive 3D streaming applications with free view points.

Bits-to-Photon: End-to-End Learned Scalable Point Cloud Compression for Direct Rendering

TL;DR

, rendering at level

(

). The encoder–decoder are trained with a rate–distortion objective that jointly optimizes bit-rate and rendering quality (PSNR, LPIPS, MS-SSIM) across multiple scalable points, outperforming G-PCC and learned baselines in rendering fidelity at similar bit-rates while reducing decoding latency. The approach enables real-time color decoding and rendering for interactive 3D streaming on THuman 2.0 and 8iVFB, and suggests promising avenues for temporal extension and region-adaptive coding to further enhance scalability and visual quality.

Abstract

Paper Structure (25 sections, 7 equations, 7 figures, 6 tables)

This paper contains 25 sections, 7 equations, 7 figures, 6 tables.

Introduction
Related Works
Point Cloud Compression
3D Gaussian Representation for Point Cloud Rendering
Method
Overview of the Proposed Framework
Full-Resolution feature Extraction
Conditional Transform and Entropy Coding
Predictive 3D Gaussian Generation
Training
Experiments
Settings
Rate-Distortion Performance
Bit-rate, Complexity, and Distortion Tradeoff
Ablation Study
...and 10 more sections

Figures (7)

Figure 1: The proposed multi-resolution compression and rendering framework. The source point cloud is encoded into a scalable bit-stream, and delievered up to a resolution based on the sustainable network throughput. The client decodes the received bit-stream to 3D Gaussians and renders the point cloud according to the client current view point.
Figure 2: Illustration of key components of the proposed method.
Figure 3: Rendering distortion at different bit-rate by the proposed Bits-to-Photon (B2P) and baseline methods. We train 2 models with 2 different $\lambda$ values. Each model has 2 levels of scalability, forming 2 groups of scalable R-D points, plotted using different colors. Other methods are not scalable.
Figure 4: Visual results on the decoded and rendered point cloud on the THuman 2.0 dataset, compared to images rendered from the ground truth mesh.
Figure 5: Visual results on the decoded and rendered point clouds on the 8iVFB dataset, compared to images rendered using the generated meshes from original point clouds.
...and 2 more figures

Bits-to-Photon: End-to-End Learned Scalable Point Cloud Compression for Direct Rendering

TL;DR

Abstract

Bits-to-Photon: End-to-End Learned Scalable Point Cloud Compression for Direct Rendering

Authors

TL;DR

Abstract

Table of Contents

Figures (7)