Table of Contents
Fetching ...

Plenoptic PNG: Real-Time Neural Radiance Fields in 150 KB

Jae Yong Lee, Yuqun Wu, Chuhang Zou, Derek Hoiem, Shenlong Wang

TL;DR

Plenoptic PNG (PPNG) addresses the challenge of transmitting and rendering photorealistic free-viewpoint scenes with a tiny, universally viewable representation. It introduces a Fourier-indexed voxel approach that enables cross-location feature sharing, augmented by tensor/rank factorization to reach KB-scale models, and decodes directly into GL textures and GLSL shaders for real-time WebGL rendering. The method achieves model sizes as small as $151$ KB (PPNG-1) and up to tens of MB (PPNG-3) while maintaining competitive rendering quality and significantly faster training times compared to real-time baselines. This compact, web-friendly pipeline enables broad distribution and interactive viewing of neural scenes on lightweight devices and through standard graphics pipelines, broadening accessibility of photorealistic 3D content.

Abstract

The goal of this paper is to encode a 3D scene into an extremely compact representation from 2D images and to enable its transmittance, decoding and rendering in real-time across various platforms. Despite the progress in NeRFs and Gaussian Splats, their large model size and specialized renderers make it challenging to distribute free-viewpoint 3D content as easily as images. To address this, we have designed a novel 3D representation that encodes the plenoptic function into sinusoidal function indexed dense volumes. This approach facilitates feature sharing across different locations, improving compactness over traditional spatial voxels. The memory footprint of the dense 3D feature grid can be further reduced using spatial decomposition techniques. This design combines the strengths of spatial hashing functions and voxel decomposition, resulting in a model size as small as 150 KB for each 3D scene. Moreover, PPNG features a lightweight rendering pipeline with only 300 lines of code that decodes its representation into standard GL textures and fragment shaders. This enables real-time rendering using the traditional GL pipeline, ensuring universal compatibility and efficiency across various platforms without additional dependencies.

Plenoptic PNG: Real-Time Neural Radiance Fields in 150 KB

TL;DR

Plenoptic PNG (PPNG) addresses the challenge of transmitting and rendering photorealistic free-viewpoint scenes with a tiny, universally viewable representation. It introduces a Fourier-indexed voxel approach that enables cross-location feature sharing, augmented by tensor/rank factorization to reach KB-scale models, and decodes directly into GL textures and GLSL shaders for real-time WebGL rendering. The method achieves model sizes as small as KB (PPNG-1) and up to tens of MB (PPNG-3) while maintaining competitive rendering quality and significantly faster training times compared to real-time baselines. This compact, web-friendly pipeline enables broad distribution and interactive viewing of neural scenes on lightweight devices and through standard graphics pipelines, broadening accessibility of photorealistic 3D content.

Abstract

The goal of this paper is to encode a 3D scene into an extremely compact representation from 2D images and to enable its transmittance, decoding and rendering in real-time across various platforms. Despite the progress in NeRFs and Gaussian Splats, their large model size and specialized renderers make it challenging to distribute free-viewpoint 3D content as easily as images. To address this, we have designed a novel 3D representation that encodes the plenoptic function into sinusoidal function indexed dense volumes. This approach facilitates feature sharing across different locations, improving compactness over traditional spatial voxels. The memory footprint of the dense 3D feature grid can be further reduced using spatial decomposition techniques. This design combines the strengths of spatial hashing functions and voxel decomposition, resulting in a model size as small as 150 KB for each 3D scene. Moreover, PPNG features a lightweight rendering pipeline with only 300 lines of code that decodes its representation into standard GL textures and fragment shaders. This enables real-time rendering using the traditional GL pipeline, ensuring universal compatibility and efficiency across various platforms without additional dependencies.
Paper Structure (16 sections, 6 equations, 9 figures, 18 tables)

This paper contains 16 sections, 6 equations, 9 figures, 18 tables.

Figures (9)

  • Figure 1: Overview of our PPNG-1 Rendering Procedure: For a given PPNG file of a 3D scene, we first extract the factorized Fourier features and the shallow MLP weights (top-left). The factorized Fourier features are then composed to construct a dense Fourier-indexed feature grid (middle). In the rendering stage, for each query point we compute the sinusoidal positional encoding to extract the corresponding feature from the Fourier-indexed voxel grid. The feature vectors, spanning across the spectrum for both sine and cosine at each frequency, are then concatenated. These features are subsequently passed into the fragment shader, which employs a shallow MLP for inferring color and density and applies ray matching to determine the final pixel color.
  • Figure 2: Visualization of Two Factorized Plenoptic PNG Representations: PPNG-1 (Equation \ref{['eq:ppng_1_to_3']}) utilizes tensor-rank decomposition (left), while PPNG-2 (Equation \ref{['eq:ppng_2_to_3']}) employs tri-plane decomposition (right).
  • Figure 3: Quantitative Comparison with Real-Time, Web-Compatible NeRF Models on NeRF Synthetic dataset. Our approaches are 2-3 orders of magnitude smaller than baselines in terms of model size (x-axis) and over 10x-100x faster in training speed (marker size), while maintaining competitive PSNR (y-axis).
  • Figure 4: Qualitative Comparison on the Synthetic NeRF Dataset: We show qualitative results and compare real-time NeRF models (SNeRG and MobileNeRF) in terms of training time, model size, and quality. PPNG-1 delivers similar or superior visual quality compared to other web-friendly baselines while being at least 120x smaller in model size. PPNG-2 offers enhanced quality with a model size more than 8x smaller.
  • Figure 5: Qualitative Results on Unbounded 360$^\circ$ Scenes: We highlight the background region in the top right corner and the central region in the bottom left corner. PPNG-3 provides compelling results with detailed textures in both cases. Factorized representations reach their capacity limits in such scenes. PPNG-1, with only 128 KB parameters, fails to recreate fine details in both the central and background regions, and PPNG-2 also cannot recreate details in the background regions due to capacity with limited volume size.
  • ...and 4 more figures