Table of Contents
Fetching ...

StyleSplat: 3D Object Style Transfer with Gaussian Splatting

Sahil Jain, Avik Kuthiala, Prabhdeep Singh Sethi, Prakanshul Saxena

TL;DR

StyleSplat addresses the challenge of per-object, fast style transfer in 3D scenes represented by 3D Gaussian splats. It introduces a three-stage pipeline—2D mask generation and tracking, 3D Gaussian training/segmentation, and 3D style transfer—that localizes stylistic changes to user-selected objects by finetuning only the spherical harmonic color coefficients via a nearest-neighbor feature matching loss. The method enables multiple objects to receive different styles with high fidelity and efficiency (often under a minute on a single A100 GPU), outperforming prior approaches in localization accuracy and speed. By leveraging view-consistent 3D segmentation and SH-based color encoding, StyleSplat minimizes leakage and preserves geometric priors while delivering diverse artistic stylizations across varied scenes.

Abstract

Recent advancements in radiance fields have opened new avenues for creating high-quality 3D assets and scenes. Style transfer can enhance these 3D assets with diverse artistic styles, transforming creative expression. However, existing techniques are often slow or unable to localize style transfer to specific objects. We introduce StyleSplat, a lightweight method for stylizing 3D objects in scenes represented by 3D Gaussians from reference style images. Our approach first learns a photorealistic representation of the scene using 3D Gaussian splatting while jointly segmenting individual 3D objects. We then use a nearest-neighbor feature matching loss to finetune the Gaussians of the selected objects, aligning their spherical harmonic coefficients with the style image to ensure consistency and visual appeal. StyleSplat allows for quick, customizable style transfer and localized stylization of multiple objects within a scene, each with a different style. We demonstrate its effectiveness across various 3D scenes and styles, showcasing enhanced control and customization in 3D creation.

StyleSplat: 3D Object Style Transfer with Gaussian Splatting

TL;DR

StyleSplat addresses the challenge of per-object, fast style transfer in 3D scenes represented by 3D Gaussian splats. It introduces a three-stage pipeline—2D mask generation and tracking, 3D Gaussian training/segmentation, and 3D style transfer—that localizes stylistic changes to user-selected objects by finetuning only the spherical harmonic color coefficients via a nearest-neighbor feature matching loss. The method enables multiple objects to receive different styles with high fidelity and efficiency (often under a minute on a single A100 GPU), outperforming prior approaches in localization accuracy and speed. By leveraging view-consistent 3D segmentation and SH-based color encoding, StyleSplat minimizes leakage and preserves geometric priors while delivering diverse artistic stylizations across varied scenes.

Abstract

Recent advancements in radiance fields have opened new avenues for creating high-quality 3D assets and scenes. Style transfer can enhance these 3D assets with diverse artistic styles, transforming creative expression. However, existing techniques are often slow or unable to localize style transfer to specific objects. We introduce StyleSplat, a lightweight method for stylizing 3D objects in scenes represented by 3D Gaussians from reference style images. Our approach first learns a photorealistic representation of the scene using 3D Gaussian splatting while jointly segmenting individual 3D objects. We then use a nearest-neighbor feature matching loss to finetune the Gaussians of the selected objects, aligning their spherical harmonic coefficients with the style image to ensure consistency and visual appeal. StyleSplat allows for quick, customizable style transfer and localized stylization of multiple objects within a scene, each with a different style. We demonstrate its effectiveness across various 3D scenes and styles, showcasing enhanced control and customization in 3D creation.
Paper Structure (20 sections, 6 equations, 8 figures)

This paper contains 20 sections, 6 equations, 8 figures.

Figures (8)

  • Figure 1: We introduce StyleSplat, an approach for lightweight, customizable, and localized stylization of 3D objects from reference style images. Our approach first learns a photorealistic representation of the scene with 3D Gaussian splatting while jointly segmenting the scene into individual 3D objects using 2D masks. We then employ a nearest-neighbor feature matching loss to finetune and stylize the user-specified objects using the provided style images.
  • Figure 2: Our approach for StyleSplat. We first use an off-the-shelf segmentation and tracking model cheng2023tracking to generate view-consistent 2D object masks. Then, we use the multi-view images to learn the geometry and color of 3D Gaussians while simultaneously learning a per Gaussian feature vector. These feature vectors are decoded into object labels using a linear classifier to collect the Gaussians corresponding to the user-specified objects. The SH coefficients of these selected Gaussians are finetuned to align with the style image using NNFM loss.
  • Figure 3: Effect of 3D segmentation on localized style transfer. The first column shows the initial 3D object. The second column demonstrates the limitations of using a masked loss similar to previous radiance field-based approaches lahiri2023s2rfcoarfli2023arfpluscontrollingperceptualfactors. 2D masks can be inconsistent across views and introduce errors, leading to artifacts in different parts of the scene due to incorrect Gaussians being modified. The third column illustrates the benefits of training with a collection of noisy masks to learn a view-consistent feature vector per Gaussian, effectively correcting these errors and avoiding leakage.
  • Figure 4: 3D segmentation results. Figure (a) shows the ground truth image, (b) displays the masks extracted using SAM and DEVA, (c) visualizes the learned feature vectors of all objects in the scene, (d) presents the extracted object, and (e) illustrates the final stylized result.
  • Figure 5: Shows single object style transfer on the bear and pinecone scenes with style images of different artistic styles and composition. Our approach localizes style transfer to the selected objects, without affecting the background.
  • ...and 3 more figures