Table of Contents
Fetching ...

Morpheus: Text-Driven 3D Gaussian Splat Shape and Color Stylization

Jamie Wynn, Zawar Qureshi, Jakub Powierza, Jamie Watson, Mohamed Sayed

TL;DR

Morpheus tackles the challenge of text-driven stylization for 3D scenes by enabling geometry changes in addition to appearance. It introduces an autoregressive pipeline that stylizes frames via a dedicated RGBD diffusion model with independent appearance and depth strength controls, and propagates stylization across views with a Warp ControlNet and depth-informed feature sharing. The method retrains a 3D Gaussian Splatting (3DGS) model on the stylized frames, achieving improved multi-view consistency and more striking geometry alterations compared to prior work. Quantitative metrics and a user study demonstrate superior adherence to prompts and higher aesthetic quality, highlighting the practical impact for data-efficient 3D stylization and downstream tasks with limited training data.

Abstract

Exploring real-world spaces using novel-view synthesis is fun, and reimagining those worlds in a different style adds another layer of excitement. Stylized worlds can also be used for downstream tasks where there is limited training data and a need to expand a model's training distribution. Most current novel-view synthesis stylization techniques lack the ability to convincingly change geometry. This is because any geometry change requires increased style strength which is often capped for stylization stability and consistency. In this work, we propose a new autoregressive 3D Gaussian Splatting stylization method. As part of this method, we contribute a new RGBD diffusion model that allows for strength control over appearance and shape stylization. To ensure consistency across stylized frames, we use a combination of novel depth-guided cross attention, feature injection, and a Warp ControlNet conditioned on composite frames for guiding the stylization of new frames. We validate our method via extensive qualitative results, quantitative experiments, and a user study. Code online.

Morpheus: Text-Driven 3D Gaussian Splat Shape and Color Stylization

TL;DR

Morpheus tackles the challenge of text-driven stylization for 3D scenes by enabling geometry changes in addition to appearance. It introduces an autoregressive pipeline that stylizes frames via a dedicated RGBD diffusion model with independent appearance and depth strength controls, and propagates stylization across views with a Warp ControlNet and depth-informed feature sharing. The method retrains a 3D Gaussian Splatting (3DGS) model on the stylized frames, achieving improved multi-view consistency and more striking geometry alterations compared to prior work. Quantitative metrics and a user study demonstrate superior adherence to prompts and higher aesthetic quality, highlighting the practical impact for data-efficient 3D stylization and downstream tasks with limited training data.

Abstract

Exploring real-world spaces using novel-view synthesis is fun, and reimagining those worlds in a different style adds another layer of excitement. Stylized worlds can also be used for downstream tasks where there is limited training data and a need to expand a model's training distribution. Most current novel-view synthesis stylization techniques lack the ability to convincingly change geometry. This is because any geometry change requires increased style strength which is often capped for stylization stability and consistency. In this work, we propose a new autoregressive 3D Gaussian Splatting stylization method. As part of this method, we contribute a new RGBD diffusion model that allows for strength control over appearance and shape stylization. To ensure consistency across stylized frames, we use a combination of novel depth-guided cross attention, feature injection, and a Warp ControlNet conditioned on composite frames for guiding the stylization of new frames. We validate our method via extensive qualitative results, quantitative experiments, and a user study. Code online.

Paper Structure

This paper contains 45 sections, 13 equations, 18 figures, 6 tables.

Figures (18)

  • Figure 1: We introduce a new method for novel-view stylization using text prompts. The output of our method is a stylized 3D Gaussian Splatting model, from which we show renders here. Our method allows stylization control of both appearance and shape. Using the same prompt, our method can produce different stylizations with the same overall texture, but variable shape alteration allowing for more striking shape and color stylization compared to GaussCtrl wu2024gaussctrl. We show multiple stylizations of the same scene. https://nianticlabs.github.io/morpheus/.
  • Figure 2: Method Overview a) Our pipeline takes as input a novel view synthesis model, in this case a 3D Gaussian Splatting (3DGS) model, and first renders a set of representative images and their depth maps ${\{I^R, D^R\}}$. b) Our pipeline stylizes rendered images autoregressively. We use a novel RGBD diffusion model (Section \ref{['sec:rgbd_model']}) conditioned on the input RGBD render ${\{I_i^S, D_i^S\}}$, a stylization prompt, and stylization noise parameters that modulate the strength of appearance and shape stylization. For every subsequent frame, we warp previously stylized frames ${\{I_j^S, D_j^S\}}$ to the current frame and form a composite ${\{I_{ij}^C, D_{ij}^C\}}$. We use a Warp ControlNet (Section \ref{['sec:warpingcontrolnet']}) conditioned on the warped composite and a validity mask to guide the RGBD stylization of the current frame ${\{I_i^R, D_i^R\}}$ to produce ${\{I_i^S, D_i^S\}}$. During diffusion we use depth-informed feature sharing (Section \ref{['sec:depth_informed_sharing']}) to propagate deep stylization features. c) We then retrain a 3DGS model using newly stylized frames ${\{I^S, D^S\}}$.
  • Figure 3: For the same prompt, we vary stylization strength for geometry. a) We show the output of our RGBD model for the same stylization prompt but with varying depth stylization strengths. Note how the depths change when we ask for higher depth stylization but the overall color gamut does not. b) We show the effect of shape stylization in output 3DGS models from our method.
  • Figure 4: Feature information sharing We show a slice through the heatmap $L$ for a single pixel in the target frame.
  • Figure 5: Qualitative Ablations We show two a) reference frames, the results of style propagation using ours in b) and e), and then ablations in c) and f). c) without our Warp ControlNet, the geometry and texture on the face and tie are not propagated correctly. f) by cross attending everywhere across the entirety of the frame without depth-informed feature sharing, patches like the bear's eye may be misplaced or repeated leading to inconsistency.
  • ...and 13 more figures