Feed-Forward 3D Gaussian Splatting Compression with Long-Context Modeling

Zhening Liu; Rui Song; Yushi Huang; Yingdong Hu; Xinjie Zhang; Jiawei Shao; Zehong Lin; Jun Zhang

Feed-Forward 3D Gaussian Splatting Compression with Long-Context Modeling

Zhening Liu, Rui Song, Yushi Huang, Yingdong Hu, Xinjie Zhang, Jiawei Shao, Zehong Lin, Jun Zhang

TL;DR

<3-5 sentence high-level summary> The paper tackles the large data footprint of 3D Gaussian Splatting (3DGS) by introducing LocoMoco, a feed-forward compression framework that models long-range dependencies across thousands of Gaussians. It uses Morton serialization to create large context windows and employs a serialized-attention transform together with a fine-grained space-channel autoregressive context model to achieve significant rate-distortion gains. The method yields approximately 20× compression for 3DGS in feed-forward inference and outperforms prior generalizable codecs (notably FCGS) with about a 10% BD-Rate improvement, while maintaining practical encoding/decoding speeds. This work demonstrates that long-context modeling is crucial for efficient, scalable 3DGS compression in real-world, in-the-wild data scenarios.

Abstract

3D Gaussian Splatting (3DGS) has emerged as a revolutionary 3D representation. However, its substantial data size poses a major barrier to widespread adoption. While feed-forward 3DGS compression offers a practical alternative to costly per-scene per-train compressors, existing methods struggle to model long-range spatial dependencies, due to the limited receptive field of transform coding networks and the inadequate context capacity in entropy models. In this work, we propose a novel feed-forward 3DGS compression framework that effectively models long-range correlations to enable highly compact and generalizable 3D representations. Central to our approach is a large-scale context structure that comprises thousands of Gaussians based on Morton serialization. We then design a fine-grained space-channel auto-regressive entropy model to fully leverage this expansive context. Furthermore, we develop an attention-based transform coding model to extract informative latent priors by aggregating features from a wide range of neighboring Gaussians. Our method yields a $20\times$ compression ratio for 3DGS in a feed-forward inference and achieves state-of-the-art performance among generalizable codecs.

Feed-Forward 3D Gaussian Splatting Compression with Long-Context Modeling

TL;DR

Abstract

Feed-Forward 3D Gaussian Splatting Compression with Long-Context Modeling

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (16)