Table of Contents
Fetching ...

Gaussian Splatting in Style

Abhishek Saroha, Mariia Gladkova, Cecilia Curreli, Dominik Muhle, Tarun Yenamandra, Daniel Cremers

TL;DR

The paper tackles 3D scene stylization with strong view consistency by adopting 3D Gaussian Splatting as an explicit scene representation. It introduces Gaussian Splatting in Style (GSS), which adds a 3D Color module conditioned on style images and a 2D AdaIN-based guide loss to achieve coherent stylization across views in real time. Through joint end-to-end training and a warm-up strategy, GSS preserves geometry while transferring style, outperforming NeRF-based and other baselines in short-term and long-term consistency and achieving around $157$ FPS. The approach enables practical AR/VR use cases by delivering high-quality, view-consistent stylized novel views without per-style scene fitting, and it supports style interpolation and test-time generalization to unseen styles.

Abstract

3D scene stylization extends the work of neural style transfer to 3D. A vital challenge in this problem is to maintain the uniformity of the stylized appearance across multiple views. A vast majority of the previous works achieve this by training a 3D model for every stylized image and a set of multi-view images. In contrast, we propose a novel architecture trained on a collection of style images that, at test time, produces real time high-quality stylized novel views. We choose the underlying 3D scene representation for our model as 3D Gaussian splatting. We take the 3D Gaussians and process them using a multi-resolution hash grid and a tiny MLP to obtain stylized views. The MLP is conditioned on different style codes for generalization to different styles during test time. The explicit nature of 3D Gaussians gives us inherent advantages over NeRF-based methods, including geometric consistency and a fast training and rendering regime. This enables our method to be useful for various practical use cases, such as augmented or virtual reality. We demonstrate that our method achieves state-of-the-art performance with superior visual quality on various indoor and outdoor real-world data.

Gaussian Splatting in Style

TL;DR

The paper tackles 3D scene stylization with strong view consistency by adopting 3D Gaussian Splatting as an explicit scene representation. It introduces Gaussian Splatting in Style (GSS), which adds a 3D Color module conditioned on style images and a 2D AdaIN-based guide loss to achieve coherent stylization across views in real time. Through joint end-to-end training and a warm-up strategy, GSS preserves geometry while transferring style, outperforming NeRF-based and other baselines in short-term and long-term consistency and achieving around FPS. The approach enables practical AR/VR use cases by delivering high-quality, view-consistent stylized novel views without per-style scene fitting, and it supports style interpolation and test-time generalization to unseen styles.

Abstract

3D scene stylization extends the work of neural style transfer to 3D. A vital challenge in this problem is to maintain the uniformity of the stylized appearance across multiple views. A vast majority of the previous works achieve this by training a 3D model for every stylized image and a set of multi-view images. In contrast, we propose a novel architecture trained on a collection of style images that, at test time, produces real time high-quality stylized novel views. We choose the underlying 3D scene representation for our model as 3D Gaussian splatting. We take the 3D Gaussians and process them using a multi-resolution hash grid and a tiny MLP to obtain stylized views. The MLP is conditioned on different style codes for generalization to different styles during test time. The explicit nature of 3D Gaussians gives us inherent advantages over NeRF-based methods, including geometric consistency and a fast training and rendering regime. This enables our method to be useful for various practical use cases, such as augmented or virtual reality. We demonstrate that our method achieves state-of-the-art performance with superior visual quality on various indoor and outdoor real-world data.
Paper Structure (19 sections, 6 equations, 7 figures, 2 tables)

This paper contains 19 sections, 6 equations, 7 figures, 2 tables.

Figures (7)

  • Figure 1: Given multi-view images of a real-world scene, we perform the task of scene stylization. Unlike previous scene stylization approaches, we do not need to fit a scene to each new style. Employing a neural network, conditioned on a style image, allows us to generalize to a variety of styles. Our method can generate 3D consistent scene stylization at approximately $150$ FPS.
  • Figure 2: The motivation from our work stems from the requirement of a specialized method that, while stylizing a scene, considers the spatial information into account. We show that to generate stylized novel views of a scene, it is insufficient to stylize the rendered views or train a scene representation model on stylized 2D images. It leads to loss of information, such as deformity in the solid truck's body shown above.
  • Figure 3: Here we diagramatically show the overall architecture of our pipeline. We employ a novel 3D Color module that is jointly trained with the 3D Gaussians, to predict the new colors for each Gaussian based on the querying style image at test time.
  • Figure 4: We provide a detailed qualitative comparison of our method against the baselines detailed in \ref{['sec:experiments']} on the (a) TnT and (b) LLFF datasets. Our method achieves a highly accurate stylization based on the input style. While methods such as ARF obtain a better texture, it is attributed to the fact that it is optimized separately for each style. StylizedNerfhuang2022stylizednerf produces images that suffer from over smoothing and blurriness, while StyleRF fails to grasp the accurate style color. On the other hand, our proposed method is able to retain high details present in the unstyled view while transferring the adequate texture and colors of the style image for both, indoor and outdoor real-world datasets.
  • Figure 5: We show the effect of deploying a joint-training regime of training the 3D Gaussians in conjunction with the 3D Color module. Having a joint training in an end-to-end fashion helps to preserve key details and geometry in the rendered stylized novel view.
  • ...and 2 more figures