Table of Contents
Fetching ...

Learning Unified Representation of 3D Gaussian Splatting

Yuelin Xin, Yuheng Liu, Xiaohui Xie, Xinke Li

TL;DR

The paper tackles the challenge of learning 3D Gaussian Splatting by criticizing the native parametric space for non-uniqueness, numerical heterogeneity, and incompatible manifold structure. It introduces a geometry-aware submanifold-field representation, mapping each Gaussian primitive to a color field on its iso-probability surface and enforcing a unique correspondence to the underlying radiance field. A Submanifold Field Variational Autoencoder (SF-VAE) is developed to encode these fields into compact embeddings, with a Wasserstein-2 based Manifold Distance guiding learning to align perceptual quality rather than parameter distance. Across ShapeSplat and Mip-NeRF 360 datasets, the SF-based embeddings yield higher fidelity reconstructions, stronger cross-domain generalization, and more stable latent spaces, with a Gaussian Neural Field demonstrating improved learnability when conditioned on SF embeddings. The findings suggest that geometry-aware embeddings are a robust learning target for 3D Gaussian Splatting and pave the way for diffusion-inspired generation, compression, and downstream neural-field applications.

Abstract

A well-designed vectorized representation is crucial for the learning systems natively based on 3D Gaussian Splatting. While 3DGS enables efficient and explicit 3D reconstruction, its parameter-based representation remains hard to learn as features, especially for neural-network-based models. Directly feeding raw Gaussian parameters into learning frameworks fails to address the non-unique and heterogeneous nature of the Gaussian parameterization, yielding highly data-dependent models. This challenge motivates us to explore a more principled approach to represent 3D Gaussian Splatting in neural networks that preserves the underlying color and geometric structure while enforcing unique mapping and channel homogeneity. In this paper, we propose an embedding representation of 3DGS based on continuous submanifold fields that encapsulate the intrinsic information of Gaussian primitives, thereby benefiting the learning of 3DGS. Implementation available at https://github.com/cilix-ai/gs-embedding

Learning Unified Representation of 3D Gaussian Splatting

TL;DR

The paper tackles the challenge of learning 3D Gaussian Splatting by criticizing the native parametric space for non-uniqueness, numerical heterogeneity, and incompatible manifold structure. It introduces a geometry-aware submanifold-field representation, mapping each Gaussian primitive to a color field on its iso-probability surface and enforcing a unique correspondence to the underlying radiance field. A Submanifold Field Variational Autoencoder (SF-VAE) is developed to encode these fields into compact embeddings, with a Wasserstein-2 based Manifold Distance guiding learning to align perceptual quality rather than parameter distance. Across ShapeSplat and Mip-NeRF 360 datasets, the SF-based embeddings yield higher fidelity reconstructions, stronger cross-domain generalization, and more stable latent spaces, with a Gaussian Neural Field demonstrating improved learnability when conditioned on SF embeddings. The findings suggest that geometry-aware embeddings are a robust learning target for 3D Gaussian Splatting and pave the way for diffusion-inspired generation, compression, and downstream neural-field applications.

Abstract

A well-designed vectorized representation is crucial for the learning systems natively based on 3D Gaussian Splatting. While 3DGS enables efficient and explicit 3D reconstruction, its parameter-based representation remains hard to learn as features, especially for neural-network-based models. Directly feeding raw Gaussian parameters into learning frameworks fails to address the non-unique and heterogeneous nature of the Gaussian parameterization, yielding highly data-dependent models. This challenge motivates us to explore a more principled approach to represent 3D Gaussian Splatting in neural networks that preserves the underlying color and geometric structure while enforcing unique mapping and channel homogeneity. In this paper, we propose an embedding representation of 3DGS based on continuous submanifold fields that encapsulate the intrinsic information of Gaussian primitives, thereby benefiting the learning of 3DGS. Implementation available at https://github.com/cilix-ai/gs-embedding

Paper Structure

This paper contains 24 sections, 2 theorems, 22 equations, 11 figures, 4 tables, 2 algorithms.

Key Result

Proposition 1

The parametric representation of a SGRF is not unique. Formally, there exist at least two distinct parameter sets, $\boldsymbol{\theta}_1 \in \Theta$ and $\boldsymbol{\theta}_2 \in \Theta$ with $\boldsymbol{\theta}_1 \neq \boldsymbol{\theta}_2$, that generate the exact same field $\phi_{\mathcal{G}}

Figures (11)

  • Figure 1: A scene of $N$ Gaussian primitives can be represented by $N$ sets of parameters $\boldsymbol{\theta}$ (shown in pink). Data in this parametric space resides on different manifolds and is heterogeneous and non-Euclidean, introducing challenges for encoders to fit disparate data manifolds implicitly. Shown in purple is the proposed representation, instead of relying on Gaussian parameterization, we introduce a canonical submanifold field space $(\mathcal{M}, F)$ that uniquely represents a Gaussian primitive with an iso-probability surface.
  • Figure 2: To embed the proposed submanifold field representation into a vector form suitable for neural networks, we devise a Submanifold Field Variational Auto-encoder (SF-VAE) that embeds any input submanifold field as a 32-D vector, then reconstructs the original parameter set $\boldsymbol{\theta}_i$. SF-VAE learns in our new representation space instead of the parametric space.
  • Figure 3: Qualitative results for rasterized reconstruction. Samples selected arbitrarily from Mip-NeRF 360 and ShapeSplat. Parametric models can induce confusion in parameter space, failing to embed and restore the correct Gaussian parameters.
  • Figure 4: Reconstruction results using embeddings with noise. Left: Visualization of reconstructed scene from noisy embeddings of Gaussian parameters (MLP) and SF-VAE. Right: Comparison on M-Dist for different noise levels added to embedding space, tested on Mip-NeRF 360. Noise level is defined as the ratio between the noise magnitude and the embedding variance.
  • Figure 5: Unsupervised graph clustering based on raw Gaussian parameters and various embeddings. Submanifold field embeddings show better preservation of detailed semantics, showing its downstream applicability.
  • ...and 6 more figures

Theorems & Definitions (3)

  • Definition 1: Single Gaussian Radiance Field (SGRF)
  • Proposition 1: Non-uniqueness of the SGRF Parametric Representation
  • Proposition 2: Uniqueness of Submanifold Field Representation