Table of Contents
Fetching ...

StyleGaussian: Instant 3D Style Transfer with Gaussian Splatting

Kunhao Liu, Fangneng Zhan, Muyu Xu, Christian Theobalt, Ling Shao, Shijian Lu

TL;DR

Extensive experiments show that StyleGaussian achieves instant 3D stylization with superior stylization quality while preserving real-time rendering and strict multi-view consistency.

Abstract

We introduce StyleGaussian, a novel 3D style transfer technique that allows instant transfer of any image's style to a 3D scene at 10 frames per second (fps). Leveraging 3D Gaussian Splatting (3DGS), StyleGaussian achieves style transfer without compromising its real-time rendering ability and multi-view consistency. It achieves instant style transfer with three steps: embedding, transfer, and decoding. Initially, 2D VGG scene features are embedded into reconstructed 3D Gaussians. Next, the embedded features are transformed according to a reference style image. Finally, the transformed features are decoded into the stylized RGB. StyleGaussian has two novel designs. The first is an efficient feature rendering strategy that first renders low-dimensional features and then maps them into high-dimensional features while embedding VGG features. It cuts the memory consumption significantly and enables 3DGS to render the high-dimensional memory-intensive features. The second is a K-nearest-neighbor-based 3D CNN. Working as the decoder for the stylized features, it eliminates the 2D CNN operations that compromise strict multi-view consistency. Extensive experiments show that StyleGaussian achieves instant 3D stylization with superior stylization quality while preserving real-time rendering and strict multi-view consistency. Project page: https://kunhao-liu.github.io/StyleGaussian/

StyleGaussian: Instant 3D Style Transfer with Gaussian Splatting

TL;DR

Extensive experiments show that StyleGaussian achieves instant 3D stylization with superior stylization quality while preserving real-time rendering and strict multi-view consistency.

Abstract

We introduce StyleGaussian, a novel 3D style transfer technique that allows instant transfer of any image's style to a 3D scene at 10 frames per second (fps). Leveraging 3D Gaussian Splatting (3DGS), StyleGaussian achieves style transfer without compromising its real-time rendering ability and multi-view consistency. It achieves instant style transfer with three steps: embedding, transfer, and decoding. Initially, 2D VGG scene features are embedded into reconstructed 3D Gaussians. Next, the embedded features are transformed according to a reference style image. Finally, the transformed features are decoded into the stylized RGB. StyleGaussian has two novel designs. The first is an efficient feature rendering strategy that first renders low-dimensional features and then maps them into high-dimensional features while embedding VGG features. It cuts the memory consumption significantly and enables 3DGS to render the high-dimensional memory-intensive features. The second is a K-nearest-neighbor-based 3D CNN. Working as the decoder for the stylized features, it eliminates the 2D CNN operations that compromise strict multi-view consistency. Extensive experiments show that StyleGaussian achieves instant 3D stylization with superior stylization quality while preserving real-time rendering and strict multi-view consistency. Project page: https://kunhao-liu.github.io/StyleGaussian/
Paper Structure (16 sections, 10 equations, 11 figures, 1 table)

This paper contains 16 sections, 10 equations, 11 figures, 1 table.

Figures (11)

  • Figure 1: We introduce StyleGaussian, a novel 3D style transfer pipeline that enables instant style transfer while ensuring strict multi-view consistency. We mask the background for aesthetics, complete stylized novel views are shown in the experiments part.
  • Figure 2: Overview of StyleGaussian. Given a reconstructed 3D Gaussians $\mathbb{G}$, we first embed the VGG features to the 3D Gaussians ($e$). Then, given a style image, we transform the features of the embedded Gaussians $\mathbb{G}^e$ to obtain $\mathbb{G}^t$, where the features are infused with the style information ($t$). Lastly, we decode the transformed features of $\mathbb{G}^t$ into RGB to produce the final stylized 3D Gaussians $\mathbb{G}^s$ ($d$). We design an efficient feature rendering strategy in $e$ that enables rendering high-dimensional VGG features while learning to embed them into $\mathbb{G}$. We also develop a KNN-based 3D CNN as the decoder in $d$.
  • Figure 3: Efficient feature rendering. We first render the low-dimensional features $\boldsymbol{F}'$ and then map them to high-dimensional features $\boldsymbol{F}$. Part $\hbox{i}$ corresponds to \ref{['eq:affine']}, where $\hbox{T}$ denotes the affine transformation applied to $\boldsymbol{F}'$. Part $\hbox{ii}$ corresponds to \ref{['eq:derivation']}, where $\hbox{T}'$ denotes the affine transformation applied to $\boldsymbol{f}'_i$. $\hbox{T}$ can be reformulated to $\hbox{T}'$, enabling the derivation of the high-dimensional feature $\boldsymbol{f}_i$ from $\boldsymbol{f}'_i$ for each Gaussian.
  • Figure 4: Illustation of the KNN-based convolution. We treat each Gaussian’s KNN as the sliding window. Left represents the Gaussians in one layer, right represents the Gaussians in the next layer.
  • Figure 5: Qualitative results. Comparison of StyleGaussian with two zero-shot radiance field style transfer methods: HyperNet chiang2022stylizing and StyleRF liu2023stylerf. StyleGaussain demonstrates superior style transfer quality with better style alignment with the style reference images and better content preservation.
  • ...and 6 more figures