Table of Contents
Fetching ...

Smol-GS: Compact Representations for Abstract 3D Gaussian Splatting

Haishan Wang, Mohammad Hassan Vali, Arno Solin

TL;DR

<3-5 sentence high-level summary>Smol-GS tackles the memory inefficiency of 3D Gaussian Splatting by learning a compact representation that separates geometry from appearance cues. It uses an occupancy-octree to encode coordinates and learns low-dimensional, splat-wise features that are quantized and entropy-coded, with tiny MLP decoders reconstructing view-dependent appearance. A stage-wise training regime with adaptive density control achieves state-of-the-art compression while preserving rendering fidelity, enabling real-time rendering and potential downstream tasks such as navigation. The method demonstrates an order-of-magnitude reduction in storage on standard benchmarks, matching or surpassing prior methods in quality at a fraction of the size, and opens avenues for discrete 3D scene tokens and editing.

Abstract

We present Smol-GS, a novel method for learning compact representations for 3D Gaussian Splatting (3DGS). Our approach learns highly efficient encodings in 3D space that integrate both spatial and semantic information. The model captures the coordinates of the splats through a recursive voxel hierarchy, while splat-wise features store abstracted cues, including color, opacity, transformation, and material properties. This design allows the model to compress 3D scenes by orders of magnitude without loss of flexibility. Smol-GS achieves state-of-the-art compression on standard benchmarks while maintaining high rendering quality. Beyond visual fidelity, the discrete representations could potentially serve as a foundation for downstream tasks such as navigation, planning, and broader 3D scene understanding.

Smol-GS: Compact Representations for Abstract 3D Gaussian Splatting

TL;DR

<3-5 sentence high-level summary>Smol-GS tackles the memory inefficiency of 3D Gaussian Splatting by learning a compact representation that separates geometry from appearance cues. It uses an occupancy-octree to encode coordinates and learns low-dimensional, splat-wise features that are quantized and entropy-coded, with tiny MLP decoders reconstructing view-dependent appearance. A stage-wise training regime with adaptive density control achieves state-of-the-art compression while preserving rendering fidelity, enabling real-time rendering and potential downstream tasks such as navigation. The method demonstrates an order-of-magnitude reduction in storage on standard benchmarks, matching or surpassing prior methods in quality at a fraction of the size, and opens avenues for discrete 3D scene tokens and editing.

Abstract

We present Smol-GS, a novel method for learning compact representations for 3D Gaussian Splatting (3DGS). Our approach learns highly efficient encodings in 3D space that integrate both spatial and semantic information. The model captures the coordinates of the splats through a recursive voxel hierarchy, while splat-wise features store abstracted cues, including color, opacity, transformation, and material properties. This design allows the model to compress 3D scenes by orders of magnitude without loss of flexibility. Smol-GS achieves state-of-the-art compression on standard benchmarks while maintaining high rendering quality. Beyond visual fidelity, the discrete representations could potentially serve as a foundation for downstream tasks such as navigation, planning, and broader 3D scene understanding.

Paper Structure

This paper contains 51 sections, 13 equations, 16 figures, 16 tables.

Figures (16)

  • Figure 1: Smol-GS learns a compact representation of the 3D scene that (i) stores coordinates in an efficient octree-like structure and (ii) abstracts away view-dependent splat features such as color, shape, and material cues. We visualize the abstract 8-dimensional features ${\bm{f}}$ in the second and third panel by coloring $\mathrm{RGB} = \mathrm{sigmoid}(\mathrm{PCA}({\bm{f}}))$, which reveals structure in the encodings. On the Mip-NeRF 360 benchmark, the current state-of-the-art compresses from 734 MB (vanilla 3DGS) to 8.74 MB (HAC++), while Smol-GS only needs 4.75 MB.
  • Figure 2: Qualitative results on the Mip-NeRF 360 data set. Smol-GS provides high-fidelity reconstructions even outperforming vanilla 3DGS-30K in representing details in reflective, transparent, and low-texture regions. Smol-GS preserves sharp edges and material effects (specular highlights, glass, flat walls) while using an orders of magnitude smaller model. We include quantitative summary results in \ref{['table:quantitative_benchmark']}.
  • Figure 3: Method overview of Smol-GS: We train (trained parameters in blue) neural splats with tiny MLP-decoders for view-dependent rendering (\ref{['sec:neural_splats']}). The coordinates are compressed with occupancy-octree coordinate coding (\ref{['sec:coordinate-compression']}). We also learn the quantization and arithmetic coding of splat features with NLL rate terms (\ref{['sec:feature-compression']}), and employ adaptive density control with stage-wise training (loss-terms in red) to balance fidelity and size (\ref{['sec:adc', 'sec:training_stages']}).
  • Figure 4: Occupancy-octree depth vs. PSNR. We experiment with quantizing the coordinates of a trained 3DGS-30k model to be the 'Quantized-30k' models. These models are finetuned as 'Quantized-45k' for an additional 15k iterations with quantized coordinates fixed. Top: Quantitative metrics (PSNR) and splat number ratio after quantization vs. quantization recursion on the Garden scene. Bottom: Qualitative results of Quantized-30k. Higher recursions yield better quality but keep more splats.
  • Figure 5: Occupancy-octree coordinate coding: Given a point cloud (left), we recursively divide the bounding box into eight sub-boxes. Only the non-empty sub-boxes (gray) are further divided. Each division is represented by an 8-bit binary code (1=non-empty, 0=empty). $C_{ri}$ denotes the code of the $j$th division at $r$th recursion. The occupancy octree (right) is constructed by arranging all bits in a breadth-first manner. Finally, we apply Huffman coding to further compress the octree bits.
  • ...and 11 more figures