Gaussian Splatting in Style
Abhishek Saroha, Mariia Gladkova, Cecilia Curreli, Dominik Muhle, Tarun Yenamandra, Daniel Cremers
TL;DR
The paper tackles 3D scene stylization with strong view consistency by adopting 3D Gaussian Splatting as an explicit scene representation. It introduces Gaussian Splatting in Style (GSS), which adds a 3D Color module conditioned on style images and a 2D AdaIN-based guide loss to achieve coherent stylization across views in real time. Through joint end-to-end training and a warm-up strategy, GSS preserves geometry while transferring style, outperforming NeRF-based and other baselines in short-term and long-term consistency and achieving around $157$ FPS. The approach enables practical AR/VR use cases by delivering high-quality, view-consistent stylized novel views without per-style scene fitting, and it supports style interpolation and test-time generalization to unseen styles.
Abstract
3D scene stylization extends the work of neural style transfer to 3D. A vital challenge in this problem is to maintain the uniformity of the stylized appearance across multiple views. A vast majority of the previous works achieve this by training a 3D model for every stylized image and a set of multi-view images. In contrast, we propose a novel architecture trained on a collection of style images that, at test time, produces real time high-quality stylized novel views. We choose the underlying 3D scene representation for our model as 3D Gaussian splatting. We take the 3D Gaussians and process them using a multi-resolution hash grid and a tiny MLP to obtain stylized views. The MLP is conditioned on different style codes for generalization to different styles during test time. The explicit nature of 3D Gaussians gives us inherent advantages over NeRF-based methods, including geometric consistency and a fast training and rendering regime. This enables our method to be useful for various practical use cases, such as augmented or virtual reality. We demonstrate that our method achieves state-of-the-art performance with superior visual quality on various indoor and outdoor real-world data.
