Bits-to-Photon: End-to-End Learned Scalable Point Cloud Compression for Direct Rendering
Yueyu Hu, Ran Gong, Yao Wang
TL;DR
Bits-to-Photon (B2P) presents an end-to-end learned point cloud compression framework that directly decodes to renderable 3D Gaussians via a differentiable Gaussian splatting renderer. It employs an octree-based, multi-resolution pipeline with conditional entropy coding, geometry-invariant 3D sparse convolutions, and a predictive Gaussian generation module to achieve scalable detail from level $L$ to $N$, rendering at level $M$ ($L \,\le\, M \,\le\, N$). The encoder–decoder are trained with a rate–distortion objective that jointly optimizes bit-rate and rendering quality (PSNR, LPIPS, MS-SSIM) across multiple scalable points, outperforming G-PCC and learned baselines in rendering fidelity at similar bit-rates while reducing decoding latency. The approach enables real-time color decoding and rendering for interactive 3D streaming on THuman 2.0 and 8iVFB, and suggests promising avenues for temporal extension and region-adaptive coding to further enhance scalability and visual quality.
Abstract
Point cloud is a promising 3D representation for volumetric streaming in emerging AR/VR applications. Despite recent advances in point cloud compression, decoding and rendering high-quality images from lossy compressed point clouds is still challenging in terms of quality and complexity, making it a major roadblock to achieve real-time 6-Degree-of-Freedom video streaming. In this paper, we address this problem by developing a point cloud compression scheme that generates a bit stream that can be directly decoded to renderable 3D Gaussians. The encoder and decoder are jointly optimized to consider both bit-rates and rendering quality. It significantly improves the rendering quality while substantially reducing decoding and rendering time, compared to existing point cloud compression methods. Furthermore, the proposed scheme generates a scalable bit stream, allowing multiple levels of details at different bit-rate ranges. Our method supports real-time color decoding and rendering of high quality point clouds, thus paving the way for interactive 3D streaming applications with free view points.
