Table of Contents
Fetching ...

Feed-Forward 3D Gaussian Splatting Compression with Long-Context Modeling

Zhening Liu, Rui Song, Yushi Huang, Yingdong Hu, Xinjie Zhang, Jiawei Shao, Zehong Lin, Jun Zhang

TL;DR

<3-5 sentence high-level summary> The paper tackles the large data footprint of 3D Gaussian Splatting (3DGS) by introducing LocoMoco, a feed-forward compression framework that models long-range dependencies across thousands of Gaussians. It uses Morton serialization to create large context windows and employs a serialized-attention transform together with a fine-grained space-channel autoregressive context model to achieve significant rate-distortion gains. The method yields approximately 20× compression for 3DGS in feed-forward inference and outperforms prior generalizable codecs (notably FCGS) with about a 10% BD-Rate improvement, while maintaining practical encoding/decoding speeds. This work demonstrates that long-context modeling is crucial for efficient, scalable 3DGS compression in real-world, in-the-wild data scenarios.

Abstract

3D Gaussian Splatting (3DGS) has emerged as a revolutionary 3D representation. However, its substantial data size poses a major barrier to widespread adoption. While feed-forward 3DGS compression offers a practical alternative to costly per-scene per-train compressors, existing methods struggle to model long-range spatial dependencies, due to the limited receptive field of transform coding networks and the inadequate context capacity in entropy models. In this work, we propose a novel feed-forward 3DGS compression framework that effectively models long-range correlations to enable highly compact and generalizable 3D representations. Central to our approach is a large-scale context structure that comprises thousands of Gaussians based on Morton serialization. We then design a fine-grained space-channel auto-regressive entropy model to fully leverage this expansive context. Furthermore, we develop an attention-based transform coding model to extract informative latent priors by aggregating features from a wide range of neighboring Gaussians. Our method yields a $20\times$ compression ratio for 3DGS in a feed-forward inference and achieves state-of-the-art performance among generalizable codecs.

Feed-Forward 3D Gaussian Splatting Compression with Long-Context Modeling

TL;DR

<3-5 sentence high-level summary> The paper tackles the large data footprint of 3D Gaussian Splatting (3DGS) by introducing LocoMoco, a feed-forward compression framework that models long-range dependencies across thousands of Gaussians. It uses Morton serialization to create large context windows and employs a serialized-attention transform together with a fine-grained space-channel autoregressive context model to achieve significant rate-distortion gains. The method yields approximately 20× compression for 3DGS in feed-forward inference and outperforms prior generalizable codecs (notably FCGS) with about a 10% BD-Rate improvement, while maintaining practical encoding/decoding speeds. This work demonstrates that long-context modeling is crucial for efficient, scalable 3DGS compression in real-world, in-the-wild data scenarios.

Abstract

3D Gaussian Splatting (3DGS) has emerged as a revolutionary 3D representation. However, its substantial data size poses a major barrier to widespread adoption. While feed-forward 3DGS compression offers a practical alternative to costly per-scene per-train compressors, existing methods struggle to model long-range spatial dependencies, due to the limited receptive field of transform coding networks and the inadequate context capacity in entropy models. In this work, we propose a novel feed-forward 3DGS compression framework that effectively models long-range correlations to enable highly compact and generalizable 3D representations. Central to our approach is a large-scale context structure that comprises thousands of Gaussians based on Morton serialization. We then design a fine-grained space-channel auto-regressive entropy model to fully leverage this expansive context. Furthermore, we develop an attention-based transform coding model to extract informative latent priors by aggregating features from a wide range of neighboring Gaussians. Our method yields a compression ratio for 3DGS in a feed-forward inference and achieves state-of-the-art performance among generalizable codecs.

Paper Structure

This paper contains 33 sections, 11 equations, 16 figures, 6 tables.

Figures (16)

  • Figure 1: (a) Context illustration of the existing method chen2024fast, which utilizes a limited receptive field that includes only several Gaussians. (b) Visualization of the Bonsai scene. (c) Visualization of the corresponding 3DGS point cloud, where points are partitioned into Morton-serialized context windows. Points from the same context window are painted with the same color. (d) Correlations of Gaussian attributes between a target point and other points within the same context window. Strong correlations persist between distant Gaussians. These visualizations illustrate the strong yet underexplored correlations among a large amount of 3DGS primitives, verifying the necessity of a large perceptive field and motivating our long-context modeling design.
  • Figure 2: Visualizations of the Morton order in (a) 2D plane and (b) 3D space. We further exhibit the (c) visualization of the Drum scene in the NeRF-Synthetic dataset nerf and (d) the corresponding 3DGS point cloud, where points are colored according to their indices in the Morton-serialized sequence. These visualizations illustrate that Morton serialization maintains the spatial proximity, which makes our context structure design effective.
  • Figure 3: Overview of the proposed LocoMoco. The 3DGS attributes are compressed using different settings, catering for their individual properties. The architectures of each module are illustrated on the right hand side. Here, AT and ST denote the analysis and synthesis transforms, respectively. Both the transform coding network and the context entropy model have the long-context modeling capability.
  • Figure 4: (Left) Architecture of the Morton serialized attention block. (Right) Workflow of the space-channel context model. We illustrate the coding of the subgroup $\boldsymbol{m}^{1}_0$ and $\boldsymbol{m}^{1}_1$ based on $\boldsymbol{m}^{0}$. Here, yellow nodes indicate the latent prior $\boldsymbol{\psi}$, blue nodes denote the channel context $\boldsymbol{m}^{0}$, and green nodes represent the decoded subgroups in $\boldsymbol{m}^{1}$.
  • Figure 5: Rate-distortion curves evaluated on DL3DV-GS ling2024dl3dv, Mip-NeRF barron2022mip and Tanks & Temples knapitsch2017tanks datasets. The proposed LocoMoco yields significant compression compared to the vanilla 3DGS and consistently outperforms the baseline method.
  • ...and 11 more figures