Table of Contents
Fetching ...

InstantStyleGaussian: Efficient Art Style Transfer with 3D Gaussian Splatting

Xin-Yi Yu, Jun-Xin Yu, Li-Bo Zhou, Yan Wei, Lin-Lin Ou

TL;DR

InstantStyleGaussian addresses the need for fast, multi-view-consistent 3D style transfer on existing 3D Gaussian Splatting scenes. It fuses an image-conditioned diffusion model (InstantStyle) with an improved Iterative Dataset Update to edit rendered 2D views and propagate changes back to the 3DGS representation, preserving structure via edge maps and NNFM loss. The approach achieves high-quality stylization with significantly reduced editing time (roughly 20 minutes per scene) and superior multi-view consistency compared with prior 3D editing methods, demonstrated on Tanks & Temples and Mip-NeRF 360. This enables practical applications in content creation for games, VR, and AR, while outlining limitations to geometric edits and object insertion/removal for future work.

Abstract

We present InstantStyleGaussian, an innovative 3D style transfer method based on the 3D Gaussian Splatting (3DGS) scene representation. By inputting a target-style image, it quickly generates new 3D GS scenes. Our method operates on pre-reconstructed GS scenes, combining diffusion models with an improved iterative dataset update strategy. It utilizes diffusion models to generate target style images, adds these new images to the training dataset, and uses this dataset to iteratively update and optimize the GS scenes, significantly accelerating the style editing process while ensuring the quality of the generated scenes. Extensive experimental results demonstrate that our method ensures high-quality stylized scenes while offering significant advantages in style transfer speed and consistency.

InstantStyleGaussian: Efficient Art Style Transfer with 3D Gaussian Splatting

TL;DR

InstantStyleGaussian addresses the need for fast, multi-view-consistent 3D style transfer on existing 3D Gaussian Splatting scenes. It fuses an image-conditioned diffusion model (InstantStyle) with an improved Iterative Dataset Update to edit rendered 2D views and propagate changes back to the 3DGS representation, preserving structure via edge maps and NNFM loss. The approach achieves high-quality stylization with significantly reduced editing time (roughly 20 minutes per scene) and superior multi-view consistency compared with prior 3D editing methods, demonstrated on Tanks & Temples and Mip-NeRF 360. This enables practical applications in content creation for games, VR, and AR, while outlining limitations to geometric edits and object insertion/removal for future work.

Abstract

We present InstantStyleGaussian, an innovative 3D style transfer method based on the 3D Gaussian Splatting (3DGS) scene representation. By inputting a target-style image, it quickly generates new 3D GS scenes. Our method operates on pre-reconstructed GS scenes, combining diffusion models with an improved iterative dataset update strategy. It utilizes diffusion models to generate target style images, adds these new images to the training dataset, and uses this dataset to iteratively update and optimize the GS scenes, significantly accelerating the style editing process while ensuring the quality of the generated scenes. Extensive experimental results demonstrate that our method ensures high-quality stylized scenes while offering significant advantages in style transfer speed and consistency.
Paper Structure (16 sections, 2 equations, 6 figures, 1 table)

This paper contains 16 sections, 2 equations, 6 figures, 1 table.

Figures (6)

  • Figure 1: We introduce InstantStyleGaussian, an innovative 3D style transfer pipeline. By inputting a target style image, editing can commence, enabling swift style transformation while ensuring consistency across multiple views. The experimental section demonstrates additional scenes from new perspectives following the stylization process.
  • Figure 2: Overview: Our method iteratively updates a subset of the GS dataset images to edit and reconstruct the GS scenes: (1) capture rendered images from the reconstructed scene, (2) process these images and the specified style image through InstantStyle to generate new images, (3) add the new images to the training dataset, and (4) continuously iterate to update and optimize the GS scenes.
  • Figure 3: Qualitative Evaluation. Compared to the effects of StyleGaussian ref29, our method demonstrates superior style transfer quality, better matching the reference style while preserving the original content more effectively.
  • Figure 4: Without NNFM Loss, the quality of style transfer significantly decreases and does not maintain multi-view consistency.
  • Figure 5: Increasing the number of iterations leads to overfitting of textures in local regions.
  • ...and 1 more figures