Table of Contents
Fetching ...

3DOT: Texture Transfer for 3DGS Objects from a Single Reference Image

Xiao Cao, Beibei Lin, Bo Wang, Zhiyong Huang, Robby T. Tan

TL;DR

3DOT addresses the challenge of transferring texture from a single 2D image to a fixed 3D Gaussian Splatting object by introducing a progressive generation pipeline, view-consistency gradient guidance, and prompt-tuning-based gradient guidance. The method propagates edits from a reference view to neighboring views, enforces cross-view coherence through gradient-guided diffusion with cross-attention and text cues, and preserves texture characteristics by learning a texture-difference token aligned via CLIP space. Evaluations on 360-degree and face-forwarding datasets show state-of-the-art performance in both qualitative and quantitative metrics, with favorable editing speed compared to NeRF-based approaches. The work demonstrates robust texture identity preservation and view-consistent edits across unseen viewpoints, offering a practical, efficient solution for texture transfer in 3D editing scenarios.

Abstract

3D texture swapping allows for the customization of 3D object textures, enabling efficient and versatile visual transformations in 3D editing. While no dedicated method exists, adapted 2D editing and text-driven 3D editing approaches can serve this purpose. However, 2D editing requires frame-by-frame manipulation, causing inconsistencies across views, while text-driven 3D editing struggles to preserve texture characteristics from reference images. To tackle these challenges, we introduce 3DSwapping, a 3D texture swapping method that integrates: 1) progressive generation, 2) view-consistency gradient guidance, and 3) prompt-tuned gradient guidance. To ensure view consistency, our progressive generation process starts by editing a single reference image and gradually propagates the edits to adjacent views. Our view-consistency gradient guidance further reinforces consistency by conditioning the generation model on feature differences between consistent and inconsistent outputs. To preserve texture characteristics, we introduce prompt-tuning-based gradient guidance, which learns a token that precisely captures the difference between the reference image and the 3D object. This token then guides the editing process, ensuring more consistent texture preservation across views. Overall, 3DSwapping integrates these novel strategies to achieve higher-fidelity texture transfer while preserving structural coherence across multiple viewpoints. Extensive qualitative and quantitative evaluations confirm that our three novel components enable convincing and effective 2D texture swapping for 3D objects. Code will be available upon acceptance.

3DOT: Texture Transfer for 3DGS Objects from a Single Reference Image

TL;DR

3DOT addresses the challenge of transferring texture from a single 2D image to a fixed 3D Gaussian Splatting object by introducing a progressive generation pipeline, view-consistency gradient guidance, and prompt-tuning-based gradient guidance. The method propagates edits from a reference view to neighboring views, enforces cross-view coherence through gradient-guided diffusion with cross-attention and text cues, and preserves texture characteristics by learning a texture-difference token aligned via CLIP space. Evaluations on 360-degree and face-forwarding datasets show state-of-the-art performance in both qualitative and quantitative metrics, with favorable editing speed compared to NeRF-based approaches. The work demonstrates robust texture identity preservation and view-consistent edits across unseen viewpoints, offering a practical, efficient solution for texture transfer in 3D editing scenarios.

Abstract

3D texture swapping allows for the customization of 3D object textures, enabling efficient and versatile visual transformations in 3D editing. While no dedicated method exists, adapted 2D editing and text-driven 3D editing approaches can serve this purpose. However, 2D editing requires frame-by-frame manipulation, causing inconsistencies across views, while text-driven 3D editing struggles to preserve texture characteristics from reference images. To tackle these challenges, we introduce 3DSwapping, a 3D texture swapping method that integrates: 1) progressive generation, 2) view-consistency gradient guidance, and 3) prompt-tuned gradient guidance. To ensure view consistency, our progressive generation process starts by editing a single reference image and gradually propagates the edits to adjacent views. Our view-consistency gradient guidance further reinforces consistency by conditioning the generation model on feature differences between consistent and inconsistent outputs. To preserve texture characteristics, we introduce prompt-tuning-based gradient guidance, which learns a token that precisely captures the difference between the reference image and the 3D object. This token then guides the editing process, ensuring more consistent texture preservation across views. Overall, 3DSwapping integrates these novel strategies to achieve higher-fidelity texture transfer while preserving structural coherence across multiple viewpoints. Extensive qualitative and quantitative evaluations confirm that our three novel components enable convincing and effective 2D texture swapping for 3D objects. Code will be available upon acceptance.

Paper Structure

This paper contains 20 sections, 9 equations, 6 figures, 2 tables.

Figures (6)

  • Figure 1: Comparison of 2D and 3D image-based texture editing methods. Prompts are "moss-covered table" and "pink plastic bear". 2D methods Plug-n-Play plug_and_play suffers from view inconsistency problem; 3D text-driven editing methods IGS2GS instructgs2gs and GaussCtrl gaussctrl struggle to preserve texture characteristics. Ours faithfully edit the texture, material appearance, and color.
  • Figure 2: 3DOT. Our framework enables texture transfer from a single image to a 3D object. The left panels illustrate the selection of the reference image using a generative approach. Then, our method employs a progressive generation process guided by view-consistency and prompt-tuning-based gradient guidance to preserve both cross-view consistency and texture identity. $\mathbb{R}$, $\mathbb{T}$, and $\mathbb{T}'$ denote the reference set, text prompt, and learned texture difference token, respectively.
  • Figure 3: Qualitative comparison on 360-degree scenes (material and color edits): Our 3DOT method faithfully edits 3D objects' texture based on reference images.
  • Figure 4: Qualitative comparison on 360-degree scenes (complicated texture edits): Our 3DOT method successfully edits 3D objects' texture to complicated reference textures.
  • Figure 5: Qualitative comparison on face-forwarding scenes: Our 3DOT method faithfully edits 3D objects' texture to reference textures and generates the most plausible texture edits for unseen views.
  • ...and 1 more figures