Table of Contents
Fetching ...

Reference-based Controllable Scene Stylization with Gaussian Splatting

Yiqun Mei, Jiacong Xu, Vishal M. Patel

TL;DR

This work tackles reference-based 3D scene stylization by leveraging pretrained 3D Gaussian Splatting (3DGS) to enable real-time stylized view synthesis. It introduces a texture-guided Gaussian control that adaptively densifies Gaussians in texture-rich regions and a depth-based regularization to preserve original geometry, complemented by view-consistent supervision via pseudo views and the Template Correspondence Matching loss. The approach yields state-of-the-art stylization quality with high-frequency texture fidelity while maintaining real-time rendering speeds, outperforming NeRF-based and prior 3D stylization methods. This advances practical applications in digital art, filmmaking, and immersive VR by enabling high-quality, interactive 3D appearance editing aligned to content-aligned references.

Abstract

Referenced-based scene stylization that edits the appearance based on a content-aligned reference image is an emerging research area. Starting with a pretrained neural radiance field (NeRF), existing methods typically learn a novel appearance that matches the given style. Despite their effectiveness, they inherently suffer from time-consuming volume rendering, and thus are impractical for many real-time applications. In this work, we propose ReGS, which adapts 3D Gaussian Splatting (3DGS) for reference-based stylization to enable real-time stylized view synthesis. Editing the appearance of a pretrained 3DGS is challenging as it uses discrete Gaussians as 3D representation, which tightly bind appearance with geometry. Simply optimizing the appearance as prior methods do is often insufficient for modeling continuous textures in the given reference image. To address this challenge, we propose a novel texture-guided control mechanism that adaptively adjusts local responsible Gaussians to a new geometric arrangement, serving for desired texture details. The proposed process is guided by texture clues for effective appearance editing, and regularized by scene depth for preserving original geometric structure. With these novel designs, we show ReGs can produce state-of-the-art stylization results that respect the reference texture while embracing real-time rendering speed for free-view navigation.

Reference-based Controllable Scene Stylization with Gaussian Splatting

TL;DR

This work tackles reference-based 3D scene stylization by leveraging pretrained 3D Gaussian Splatting (3DGS) to enable real-time stylized view synthesis. It introduces a texture-guided Gaussian control that adaptively densifies Gaussians in texture-rich regions and a depth-based regularization to preserve original geometry, complemented by view-consistent supervision via pseudo views and the Template Correspondence Matching loss. The approach yields state-of-the-art stylization quality with high-frequency texture fidelity while maintaining real-time rendering speeds, outperforming NeRF-based and prior 3D stylization methods. This advances practical applications in digital art, filmmaking, and immersive VR by enabling high-quality, interactive 3D appearance editing aligned to content-aligned references.

Abstract

Referenced-based scene stylization that edits the appearance based on a content-aligned reference image is an emerging research area. Starting with a pretrained neural radiance field (NeRF), existing methods typically learn a novel appearance that matches the given style. Despite their effectiveness, they inherently suffer from time-consuming volume rendering, and thus are impractical for many real-time applications. In this work, we propose ReGS, which adapts 3D Gaussian Splatting (3DGS) for reference-based stylization to enable real-time stylized view synthesis. Editing the appearance of a pretrained 3DGS is challenging as it uses discrete Gaussians as 3D representation, which tightly bind appearance with geometry. Simply optimizing the appearance as prior methods do is often insufficient for modeling continuous textures in the given reference image. To address this challenge, we propose a novel texture-guided control mechanism that adaptively adjusts local responsible Gaussians to a new geometric arrangement, serving for desired texture details. The proposed process is guided by texture clues for effective appearance editing, and regularized by scene depth for preserving original geometric structure. With these novel designs, we show ReGs can produce state-of-the-art stylization results that respect the reference texture while embracing real-time rendering speed for free-view navigation.
Paper Structure (17 sections, 8 equations, 6 figures, 1 table)

This paper contains 17 sections, 8 equations, 6 figures, 1 table.

Figures (6)

  • Figure 1: Given a pretrained 3DGS model of the target scene and its paired style reference, ReGS enables real-time stylized view synthesis (at 134 FPS) with high-fidelity texture well-aligned with the reference. In contrast, only optimizing the appearance of 3DGS (denoted as Naive 3DGS), as previous methods arfstylerfstylizednerfref-nprhypernetwork do, fails to capture many texture details in the reference. We tackle the challenges in high-fidelity appearance editing with a texture-guided control mechanism that is significantly more effective than the default density control 3dgaussiansplatting in addressing texture underfitting. Side-by-side comparisons with default density control can be found in Figure \ref{['fig:control']}.
  • Figure 2: An overview of ReGS. (a) The proposed method starts with a pretrained content 3DGS of the target scene, and (b) outputs a stylized 3DGS that follows the reference. (c) We propose Texture-Guided Gaussian Control that can progressively resolve texture underfitting by automatically locating responsible Gaussians and adjusting local geometry layout for fitting high-frequency textures. (d) Once training is done, our method enables real-time stylized scene navigation.
  • Figure 3: Examples of (a) rendered depth maps using Eq.\ref{['depth_eq']} and (b) synthesized stylized pseudo views.
  • Figure 4: Ablation study on different components of ReGS. (a) Optimizing only the appearance of a 3DGS model cannot reproduce texture details. (b) Removing depth regularization causes Gaussians to float out from the surface and distort the origin geometry. (c) Without pseudo-view supervision, results lack view consistency. (d) Our full model produces the best results that faithfully respect the texture in the reference.
  • Figure 5: Effectiveness of Texture-Guided Control. We conduct controlled experiments by limiting the number of newly densified Gaussians throughout optimization. The pretrained model contains 0.3M Gaussians. The proposed texture-guided control can more faithfully reproduce the target texture details with a small number of Gaussians added (0.05M). The default strategy struggles to capture high-frequency details, even with a large number of Gaussians added (0.25M).
  • ...and 1 more figures