GaussianBlender: Instant Stylization of 3D Gaussians with Disentangled Latent Spaces
Melis Ocal, Xiaoyan Xing, Yue Li, Ngo Anh Vien, Sezer Karaoglu, Theo Gevers
TL;DR
GaussianBlender presents a diffusion-based, feed-forward editor for 3D Gaussian splats that eliminates per-asset optimization by learning latent priors in a disentangled geometry-appearance space. The method groups Gaussians spatially, encodes them into dual latents, and uses a latent diffusion model conditioned on text to apply edits, with a final stage that maps source to edited latents while preserving geometry. Across quantitative metrics and user studies, it achieves geometry-preserving, multi-view-consistent stylization with near real-time inference and generalizes to out-of-domain assets, enabling scalable 3D stylization for production. The approach demonstrates strong advantages over prior optimization-based methods and related feed-forward editors by offering controlled editing and robust 3D consistency in large-scale workflows.
Abstract
3D stylization is central to game development, virtual reality, and digital arts, where the demand for diverse assets calls for scalable methods that support fast, high-fidelity manipulation. Existing text-to-3D stylization methods typically distill from 2D image editors, requiring time-intensive per-asset optimization and exhibiting multi-view inconsistency due to the limitations of current text-to-image models, which makes them impractical for large-scale production. In this paper, we introduce GaussianBlender, a pioneering feed-forward framework for text-driven 3D stylization that performs edits instantly at inference. Our method learns structured, disentangled latent spaces with controlled information sharing for geometry and appearance from spatially-grouped 3D Gaussians. A latent diffusion model then applies text-conditioned edits on these learned representations. Comprehensive evaluations show that GaussianBlender not only delivers instant, high-fidelity, geometry-preserving, multi-view consistent stylization, but also surpasses methods that require per-instance test-time optimization - unlocking practical, democratized 3D stylization at scale.
