DirectTriGS: Triplane-based Gaussian Splatting Field Representation for 3D Generation
Xiaoliang Ju, Hongsheng Li
TL;DR
DirectTriGS introduces a triplane-based Gaussian Splatting field and a differentiable TriRenderer to enable direct, efficient 3D GS generation from text. A Triplane VAE compresses the representation and a two-stage latent diffusion model generates geometry and GS attributes, with an optional SDS-based texture refinement for enhanced realism. Experiments on Objaverse and OmniObject3D show competitive geometry and rendering quality compared with strong baselines and existing GS methods, while achieving faster, end-to-end generation without heavy 2D lifting. The approach offers a practical, memory-efficient path to high-quality 3D content creation with direct GS manipulation and editable geometry-texture coupling.
Abstract
We present DirectTriGS, a novel framework designed for 3D object generation with Gaussian Splatting (GS). GS-based rendering for 3D content has gained considerable attention recently. However, there has been limited exploration in directly generating 3D Gaussians compared to traditional generative modeling approaches. The main challenge lies in the complex data structure of GS represented by discrete point clouds with multiple channels. To overcome this challenge, we propose employing the triplane representation, which allows us to represent Gaussian Splatting as an image-like continuous field. This representation effectively encodes both the geometry and texture information, enabling smooth transformation back to Gaussian point clouds and rendering into images by a TriRenderer, with only 2D supervisions. The proposed TriRenderer is fully differentiable, so that the rendering loss can supervise both texture and geometry encoding. Furthermore, the triplane representation can be compressed using a Variational Autoencoder (VAE), which can subsequently be utilized in latent diffusion to generate 3D objects. The experiments demonstrate that the proposed generation framework can produce high-quality 3D object geometry and rendering results in the text-to-3D task.
