Lightweight Predictive 3D Gaussian Splats

Junli Cao; Vidit Goel; Chaoyang Wang; Anil Kag; Ju Hu; Sergei Korolev; Chenfanfu Jiang; Sergey Tulyakov; Jian Ren

Lightweight Predictive 3D Gaussian Splats

Junli Cao, Vidit Goel, Chaoyang Wang, Anil Kag, Ju Hu, Sergei Korolev, Chenfanfu Jiang, Sergey Tulyakov, Jian Ren

TL;DR

The paper tackles the storage bottleneck of large-scale 3D Gaussian Splat representations by introducing a lightweight predictive framework that stores only a subset of 'parent' splats and predicts the attributes of nearby 'child' splats during rendering. It represents scenes as a forest of depth-1 trees where child positions satisfy $x_k = x_p + g_{pos}(f_\Delta)[k]$ and attributes are inferred through a hash-grid $\mathcal{H}$ and a self-attention fusion over features, with shared MLPs predicting scale, rotation, color, and opacity. Training optimizes image fidelity with a loss $\mathcal{L} = (1 - \beta)\mathcal{L}_1 + \beta \mathcal{L}_{\mathrm{D-SSIM}}$ and uses a warm-up schedule to stabilize learning. Experiments on mip-nerf360, Tanks&Temples, and Deep Blending show up to ~19–20x storage reduction while achieving or exceeding PSNR compared to larger baselines, enabling on-device real-time rendering and broad practical deployment.

Abstract

Recent approaches representing 3D objects and scenes using Gaussian splats show increased rendering speed across a variety of platforms and devices. While rendering such representations is indeed extremely efficient, storing and transmitting them is often prohibitively expensive. To represent large-scale scenes, one often needs to store millions of 3D Gaussians, occupying gigabytes of disk space. This poses a very practical limitation, prohibiting widespread adoption.Several solutions have been proposed to strike a balance between disk size and rendering quality, noticeably reducing the visual quality. In this work, we propose a new representation that dramatically reduces the hard drive footprint while featuring similar or improved quality when compared to the standard 3D Gaussian splats. When compared to other compact solutions, ours offers higher quality renderings with significantly reduced storage, being able to efficiently run on a mobile device in real-time. Our key observation is that nearby points in the scene can share similar representations. Hence, only a small ratio of 3D points needs to be stored. We introduce an approach to identify such points which are called parent points. The discarded points called children points along with attributes can be efficiently predicted by tiny MLPs.

Lightweight Predictive 3D Gaussian Splats

TL;DR

and attributes are inferred through a hash-grid

and a self-attention fusion over features, with shared MLPs predicting scale, rotation, color, and opacity. Training optimizes image fidelity with a loss

and uses a warm-up schedule to stabilize learning. Experiments on mip-nerf360, Tanks&Temples, and Deep Blending show up to ~19–20x storage reduction while achieving or exceeding PSNR compared to larger baselines, enabling on-device real-time rendering and broad practical deployment.

Abstract

Paper Structure (18 sections, 2 equations, 7 figures, 6 tables, 2 algorithms)

This paper contains 18 sections, 2 equations, 7 figures, 6 tables, 2 algorithms.

Introduction
Related Work
Preliminaries
Method
Neural Representation for Lightweight Predictive Splats
Adaptive Tree Manipulation
Training
Experiments
Comparison Results
Ablation Analysis
Conclusion
Implementation Details
Settings of Learning Rate
Details for Contraction
Analysis of Warm-up
...and 3 more sections

Figures (7)

Figure 1: Top: We use a parent node to estimate its children nodes and the Gaussian attributes. The parent node retrieves a pair of features, used for attributes $f_\mathrm{a}$ and displacement ($f_\mathrm{\Delta}$) prediction, from the feature grid. The displacement features $f_\mathrm{\Delta}$ are used to estimate the positions of children nodes. To estimate the Gaussian attributes, such as scale, rotation, color, and opacity, attribute features $f_\mathrm{a}$ are aggregated with self-attention. Bottom Left: The process of querying the features. Bottom Right: A visualization of parent nodes and their predicted children nodes.
Figure 2: We plot the PSNR score of several configurations of our method and prior works computed over the dataset introduced by mip-nerf360.
Figure 3: Visual comparisons with methods offering efficient GS representations (compactgslightgs). We magnified regions to show qualitative differences. Our approach (C3) can render images with high-quality while greatly saving the storage. Zoom-in for greater detail.
Figure 4: Visual comparison of model trained with and without ATM. We can see that model trained without ATM fails to model intricate details in the scene.
Figure 5: The effect of Adaptive Tree Manipulation (ATM). Yellow points indicate the splats who have not changed the parent status during entire optimization. Green points represent former children that have been promoted to parents. Around 80% of parents are from our ATM operation.
...and 2 more figures

Lightweight Predictive 3D Gaussian Splats

TL;DR

Abstract

Lightweight Predictive 3D Gaussian Splats

Authors

TL;DR

Abstract

Table of Contents

Figures (7)