Table of Contents
Fetching ...

FantasyStyle: Controllable Stylized Distillation for 3D Gaussian Splatting

Yitong Yang, Yinglin Wang, Changshuo Wang, Huajie Wang, Shuting He

TL;DR

FantasyStyle tackles multi-view inconsistency and content leakage in 3D Gaussian Splatting style transfer by shifting from VGG-based guidance to diffusion-prior distillation. It introduces Multi-View Frequency Consistency to preserve high-frequency textures while stabilizing across views, and Controllable Stylized Distillation with negative guidance to suppress content leakage. By removing the reconstruction term and using discrete timesteps, it achieves sharper brushstroke isolation and faster optimization. Across diverse scenes and styles, it outperforms state-of-the-art 3DGS style-transfer methods in stylization quality and content preservation. This work enables diffusion-based 3D style transfer and facilitates extending 2D diffusion techniques to 3D content.

Abstract

The success of 3DGS in generative and editing applications has sparked growing interest in 3DGS-based style transfer. However, current methods still face two major challenges: (1) multi-view inconsistency often leads to style conflicts, resulting in appearance smoothing and distortion; and (2) heavy reliance on VGG features, which struggle to disentangle style and content from style images, often causing content leakage and excessive stylization. To tackle these issues, we introduce \textbf{FantasyStyle}, a 3DGS-based style transfer framework, and the first to rely entirely on diffusion model distillation. It comprises two key components: (1) \textbf{Multi-View Frequency Consistency}. We enhance cross-view consistency by applying a 3D filter to multi-view noisy latent, selectively reducing low-frequency components to mitigate stylized prior conflicts. (2) \textbf{Controllable Stylized Distillation}. To suppress content leakage from style images, we introduce negative guidance to exclude undesired content. In addition, we identify the limitations of Score Distillation Sampling and Delta Denoising Score in 3D style transfer and remove the reconstruction term accordingly. Building on these insights, we propose a controllable stylized distillation that leverages negative guidance to more effectively optimize the 3D Gaussians. Extensive experiments demonstrate that our method consistently outperforms state-of-the-art approaches, achieving higher stylization quality and visual realism across various scenes and styles. The code is available at https://github.com/yangyt46/FantasyStyle.

FantasyStyle: Controllable Stylized Distillation for 3D Gaussian Splatting

TL;DR

FantasyStyle tackles multi-view inconsistency and content leakage in 3D Gaussian Splatting style transfer by shifting from VGG-based guidance to diffusion-prior distillation. It introduces Multi-View Frequency Consistency to preserve high-frequency textures while stabilizing across views, and Controllable Stylized Distillation with negative guidance to suppress content leakage. By removing the reconstruction term and using discrete timesteps, it achieves sharper brushstroke isolation and faster optimization. Across diverse scenes and styles, it outperforms state-of-the-art 3DGS style-transfer methods in stylization quality and content preservation. This work enables diffusion-based 3D style transfer and facilitates extending 2D diffusion techniques to 3D content.

Abstract

The success of 3DGS in generative and editing applications has sparked growing interest in 3DGS-based style transfer. However, current methods still face two major challenges: (1) multi-view inconsistency often leads to style conflicts, resulting in appearance smoothing and distortion; and (2) heavy reliance on VGG features, which struggle to disentangle style and content from style images, often causing content leakage and excessive stylization. To tackle these issues, we introduce \textbf{FantasyStyle}, a 3DGS-based style transfer framework, and the first to rely entirely on diffusion model distillation. It comprises two key components: (1) \textbf{Multi-View Frequency Consistency}. We enhance cross-view consistency by applying a 3D filter to multi-view noisy latent, selectively reducing low-frequency components to mitigate stylized prior conflicts. (2) \textbf{Controllable Stylized Distillation}. To suppress content leakage from style images, we introduce negative guidance to exclude undesired content. In addition, we identify the limitations of Score Distillation Sampling and Delta Denoising Score in 3D style transfer and remove the reconstruction term accordingly. Building on these insights, we propose a controllable stylized distillation that leverages negative guidance to more effectively optimize the 3D Gaussians. Extensive experiments demonstrate that our method consistently outperforms state-of-the-art approaches, achieving higher stylization quality and visual realism across various scenes and styles. The code is available at https://github.com/yangyt46/FantasyStyle.

Paper Structure

This paper contains 12 sections, 11 equations, 9 figures, 2 tables, 1 algorithm.

Figures (9)

  • Figure 1: Limitations of the previous works. VGG-based methods often cause over-stylization and content leakage from the style image. In contrast, our approach ensures faithful style transfer and content preservation.
  • Figure 2: Overview of our proposed method. Inspired by DDS hertz2023delta, FantasyStyle consists of two distinct pathways: Source Image and Rendered Image, each highlighted in different colors. We propose Controllable Stylized Distillation (CSD) to optimize the 3D scene. In Rendered Image pathway, we introduce a multi-view frequency consistency (MVFC) and inject style image features to obtain a multi-view consistent 2D stylized prior. Additionally, we incorporate negative guidance to suppress potential content leakage from the style image.
  • Figure 3: The role of different frequency components. For intuitive visualization, we present multi-view 2D stylization results. We observe that selectively removing low-frequency components slightly reduces local detail while significantly improving multi-view consistency, whereas removing high-frequency components severely degrades texture features, resulting in blurred appearances.
  • Figure 4: Visual results of MVFC on multi-view 2D stylization. Without MVFC, inconsistent stylization conflicts arise across views.
  • Figure 5: Qualitative comparison of different methods. Our approach achieves superior style transfer quality compared to existing methods. Zoom in for better view.
  • ...and 4 more figures