Table of Contents
Fetching ...

UVGS: Reimagining Unstructured 3D Gaussian Splatting using UV Mapping

Aashish Rai, Dilin Wang, Mihir Jain, Nikolaos Sarafianos, Kefan Chen, Srinath Sridhar, Aayush Prakash

TL;DR

UVGS reframes unstructured 3D Gaussian Splatting by projecting its primitives onto a 2D UV map via spherical mapping, producing structured UVGS and a compact 3-channel Super UVGS representation. A multi-branch mapping network compresses the 14-channel UVGS into a 3-channel space, enabling direct use with pretrained 2D diffusion models for unconditional and conditional 3DGS generation, and even inpainting. The approach supports multiple UV layers to capture opacity layering and scalable resolution, enabling high-fidelity 3D generation without heavy 3D backbones. Experiments on Objaverse show competitive performance in unconditional/conditional generation and enable 3DGS inpainting with diffusion priors.

Abstract

3D Gaussian Splatting (3DGS) has demonstrated superior quality in modeling 3D objects and scenes. However, generating 3DGS remains challenging due to their discrete, unstructured, and permutation-invariant nature. In this work, we present a simple yet effective method to overcome these challenges. We utilize spherical mapping to transform 3DGS into a structured 2D representation, termed UVGS. UVGS can be viewed as multi-channel images, with feature dimensions as a concatenation of Gaussian attributes such as position, scale, color, opacity, and rotation. We further find that these heterogeneous features can be compressed into a lower-dimensional (e.g., 3-channel) shared feature space using a carefully designed multi-branch network. The compressed UVGS can be treated as typical RGB images. Remarkably, we discover that typical VAEs trained with latent diffusion models can directly generalize to this new representation without additional training. Our novel representation makes it effortless to leverage foundational 2D models, such as diffusion models, to directly model 3DGS. Additionally, one can simply increase the 2D UV resolution to accommodate more Gaussians, making UVGS a scalable solution compared to typical 3D backbones. This approach immediately unlocks various novel generation applications of 3DGS by inherently utilizing the already developed superior 2D generation capabilities. In our experiments, we demonstrate various unconditional, conditional generation, and inpainting applications of 3DGS based on diffusion models, which were previously non-trivial.

UVGS: Reimagining Unstructured 3D Gaussian Splatting using UV Mapping

TL;DR

UVGS reframes unstructured 3D Gaussian Splatting by projecting its primitives onto a 2D UV map via spherical mapping, producing structured UVGS and a compact 3-channel Super UVGS representation. A multi-branch mapping network compresses the 14-channel UVGS into a 3-channel space, enabling direct use with pretrained 2D diffusion models for unconditional and conditional 3DGS generation, and even inpainting. The approach supports multiple UV layers to capture opacity layering and scalable resolution, enabling high-fidelity 3D generation without heavy 3D backbones. Experiments on Objaverse show competitive performance in unconditional/conditional generation and enable 3DGS inpainting with diffusion priors.

Abstract

3D Gaussian Splatting (3DGS) has demonstrated superior quality in modeling 3D objects and scenes. However, generating 3DGS remains challenging due to their discrete, unstructured, and permutation-invariant nature. In this work, we present a simple yet effective method to overcome these challenges. We utilize spherical mapping to transform 3DGS into a structured 2D representation, termed UVGS. UVGS can be viewed as multi-channel images, with feature dimensions as a concatenation of Gaussian attributes such as position, scale, color, opacity, and rotation. We further find that these heterogeneous features can be compressed into a lower-dimensional (e.g., 3-channel) shared feature space using a carefully designed multi-branch network. The compressed UVGS can be treated as typical RGB images. Remarkably, we discover that typical VAEs trained with latent diffusion models can directly generalize to this new representation without additional training. Our novel representation makes it effortless to leverage foundational 2D models, such as diffusion models, to directly model 3DGS. Additionally, one can simply increase the 2D UV resolution to accommodate more Gaussians, making UVGS a scalable solution compared to typical 3D backbones. This approach immediately unlocks various novel generation applications of 3DGS by inherently utilizing the already developed superior 2D generation capabilities. In our experiments, we demonstrate various unconditional, conditional generation, and inpainting applications of 3DGS based on diffusion models, which were previously non-trivial.

Paper Structure

This paper contains 20 sections, 13 equations, 15 figures, 4 tables, 1 algorithm.

Figures (15)

  • Figure 1: In this figure, we show the qualitative results of reconstructing 3DGS object using pretrained Image Autoencoder (A) via Super UVGS. We obtain UVGS maps (U) through spherical projection of 3DGS objects, followed by using forward mapping network to get Super UVGS (S). A pretrained AE is used to reconstruct Super UVGS (S'), which can be converted to UVGS maps (U') through inverse mapping network. At last, through inverse spherical mapping, we can get predicted 3DGS object which has the same appearance and geometry as the input object with minimal loss.
  • Figure 2: The input 3DGS object is first converted to UVGS maps through spherical mapping. We use a multibranch forward mapping network to convert the obtained 14-channel UVGS to a compact 3-channel Super UVGS image. This represents the 3DGS object in a structured manner and can be used with image foundation models for reconstruction or generation. The Super UVGS is mapped back to UVGS through branched inverse mapping, which in turn can be reconstructed back to the 3DGS object through inverse spherical mapping.
  • Figure 2: Complex object reconstructions (K=4) using pretrained image-based autoencoder.
  • Figure 3: Dynamic Selection. In spherical mapping of 3DGS points to UV maps, multiple points may map to the same pixel, creating a many-to-one issue. Our Dynamic Selection approach addresses this by retaining the attributes of the point with the highest opacity per pixel on the same ray.
  • Figure 3: Reconstruction of a real-world scene for different K values. Smaller K results in many-to-one issue, hence lacking details.
  • ...and 10 more figures