Table of Contents
Fetching ...

DragTex: Generative Point-Based Texture Editing on 3D Mesh

Yudi Zhang, Qi Xu, Lei Zhang

TL;DR

DragTex addresses the challenge of editing textures directly on 3D meshes with precise spatial control. It introduces a diffusion-based pipeline that blends noisy latent edits locally near silhouettes across views to ensure cross-view consistency, while fine-tuning the decoder to preserve detail in non-drag regions. A key contribution is pre-training LoRA on multi-view images, markedly reducing per-edit training time and enabling efficient interactive editing. Additional robustness comes from static control points to preserve geometry. Together, these components enable plausible, drag-guided texture edits that align with user intent and maintain multi-view coherence, with demonstrated improvements over per-view training and artifact-prone baselines.

Abstract

Creating 3D textured meshes using generative artificial intelligence has garnered significant attention recently. While existing methods support text-based generative texture generation or editing on 3D meshes, they often struggle to precisely control pixels of texture images through more intuitive interaction. While 2D images can be edited generatively using drag interaction, applying this type of methods directly to 3D mesh textures still leads to issues such as the lack of local consistency among multiple views, error accumulation and long training times. To address these challenges, we propose a generative point-based 3D mesh texture editing method called DragTex. This method utilizes a diffusion model to blend locally inconsistent textures in the region near the deformed silhouette between different views, enabling locally consistent texture editing. Besides, we fine-tune a decoder to reduce reconstruction errors in the non-drag region, thereby mitigating overall error accumulation. Moreover, we train LoRA using multi-view images instead of training each view individually, which significantly shortens the training time. The experimental results show that our method effectively achieves dragging textures on 3D meshes and generates plausible textures that align with the desired intent of drag interaction.

DragTex: Generative Point-Based Texture Editing on 3D Mesh

TL;DR

DragTex addresses the challenge of editing textures directly on 3D meshes with precise spatial control. It introduces a diffusion-based pipeline that blends noisy latent edits locally near silhouettes across views to ensure cross-view consistency, while fine-tuning the decoder to preserve detail in non-drag regions. A key contribution is pre-training LoRA on multi-view images, markedly reducing per-edit training time and enabling efficient interactive editing. Additional robustness comes from static control points to preserve geometry. Together, these components enable plausible, drag-guided texture edits that align with user intent and maintain multi-view coherence, with demonstrated improvements over per-view training and artifact-prone baselines.

Abstract

Creating 3D textured meshes using generative artificial intelligence has garnered significant attention recently. While existing methods support text-based generative texture generation or editing on 3D meshes, they often struggle to precisely control pixels of texture images through more intuitive interaction. While 2D images can be edited generatively using drag interaction, applying this type of methods directly to 3D mesh textures still leads to issues such as the lack of local consistency among multiple views, error accumulation and long training times. To address these challenges, we propose a generative point-based 3D mesh texture editing method called DragTex. This method utilizes a diffusion model to blend locally inconsistent textures in the region near the deformed silhouette between different views, enabling locally consistent texture editing. Besides, we fine-tune a decoder to reduce reconstruction errors in the non-drag region, thereby mitigating overall error accumulation. Moreover, we train LoRA using multi-view images instead of training each view individually, which significantly shortens the training time. The experimental results show that our method effectively achieves dragging textures on 3D meshes and generates plausible textures that align with the desired intent of drag interaction.
Paper Structure (13 sections, 5 equations, 8 figures, 2 tables)

This paper contains 13 sections, 5 equations, 8 figures, 2 tables.

Figures (8)

  • Figure 1: Within a freely selected view, DragTex facilitates an intuitive drag interaction directly on a textured 3D mesh (indicated by the white arrow), producing a new texture that aligns with the intended drag movement and maintains consistency across multiple views (see images enclosed by purple boxes) instead of the texture generated with the naive method (see images enclosed by blue dashed boxes).
  • Figure 2: Overview of our method. Beginning with the multi-view LoRA pre-train, the dragged texture is generated via the dragging branch and blending branch. Our method involves optimizing the training strategy, fusion of noisy latent images, and reconstructing details outside the drag region (depicted by dashed boxes) to achieve the desired texture.
  • Figure 3: An example of reconstruction error accumulation. After the first drag in view-1, the region outside the drag region is not well reconstructed, as shown in view-2. Then, another drag in view-2 leads to more errors.
  • Figure 4: The results of adding static control points (green points in the second row).
  • Figure 5: Our results on different kinds of textured meshes. The first row shows the original images of selected views and users' drag, and the second and third row shows the rendered images of the dragged texture in the selected view and another related view.
  • ...and 3 more figures